Adding Span API perf tests for comparison with arrays and to monitor progress by ahsonkhan · Pull Request #9790 · dotnet/coreclr

ahsonkhan · 2017-02-25T10:04:24Z

No description provided.

…progress

DrewScoggins · 2017-02-26T22:55:06Z

test Windows_NT_x64 perf

DrewScoggins · 2017-02-26T22:56:33Z

-                    TestQuickSortSpan(span);
+                    for (int i = 0; i < QuickSortIterations; i++)
+                    {
+                        Array.Copy(unsortedData, data, Size);


Why are we doing the copy inside the timing loop? Seems like we would just want to be running the actual quicksort algorithm.

Same comment for all sort tests.

FYI, all these sort tests are part of the pre-existing span benchmark, which I'm responsible for.

You have to reset the array to the same initial unsorted state or it won't behave the same on all iterations. I don't see any harm in having the copy be part of the timed loop. We don't really intend to measure just the time sorting; the point of these tests is to contrast span based approaches with array based approaches (with the goal that eventually there is no penalty for using Span). As long as TestArray and TestSpan are doing the same work per iteration then the tests are measuring something interesting.

OK that makes sense. I am just wondering if the copy time dominates the execution time (I suppose I should just profile and find out 😄).

DrewScoggins · 2017-02-26T23:30:57Z

It looks like this is not building. When I tried to do a perf run I am getting some syntax errors. Is this relying on some code that had not been checked in yet?

-69557540\tests\src\JIT\Performance\CodeQuality\Span\SpanBench.csproj] 15:20:38 SpanBench.cs(544,41): error CS1525: Invalid expression term 'ref' [C:\Jenkins\workspace\perf_perflab_---69557540\tests\src\JIT\Performance\CodeQuality\Span\SpanBench.csproj] 15:20:38 SpanBench.cs(544,41): error CS1003: Syntax error, ',' expected [C:\Jenkins\workspace\perf_perflab_---69557540\tests\src\JIT\Performance\CodeQuality\Span\SpanBench.csproj] 15:20:38 SpanBench.cs(544,45): error CS1002: ; expected [C:\Jenkins\workspace\perf_perflab_---69557540\tests\src\JIT\Performance\CodeQuality\Span\SpanBench.csproj] 15:20:38 SpanBench.cs(564,22): error CS1513: } expected [C:\Jenkins\workspace\perf_perflab_---69557540\tests\src\JIT\Performance\CodeQuality\Span\SpanBench.csproj] 15:20:38 SpanBench.cs(565,40): error CS1525: Invalid expression term 'ref' [C:\Jenkins\workspace\perf_perflab_---69557540\tests\src\JIT\Performance\CodeQuality\Span\SpanBench.csproj] 15:20:38 SpanBench.cs(565,40): error CS1003: Syntax error, ',' expected [C:\Jenkins\workspace\perf_perflab_---69557540\tests\src\JIT\Performance\CodeQuality\Span\SpanBench.csproj] 15:20:38 SpanBench.cs(565,44): error CS1002: ; expected [C:\Jenkins\workspace\perf_perflab_---69557540\tests\src\JIT\Performance\CodeQuality\Span\SpanBench.csproj] 15:20:38 SpanBench.cs(585,22): error CS1513: } expected [C:\Jenkins\workspace\perf_perflab_---69557540\tests\src\JIT\Performance\CodeQuality\Span\SpanBench.csproj] 15:20:38 SpanBench.cs(586,43): error CS1525: Invalid expression term 'ref' [C:\Jenkins\workspace\perf_perflab_---69557540\tests\src\JIT\Performance\CodeQuality\Span\SpanBench.csproj] 15:20:38 SpanBench.cs(586,43): error CS1003: Syntax error, ',' expected [C:\Jenkins\workspace\perf_perflab_---69557540\tests\src\JIT\Performance\CodeQuality\Span\SpanBench.csproj] 15:20:38 SpanBench.cs(586,47): error CS1002: ; expected [C:\Jenkins\workspace\perf_perflab_---69557540\tests\src\JIT\Performance\CodeQuality\Span\SpanBench.csproj] 15:20:38 SpanBench.cs(606,22): error CS1513: } expected [C:\Jenkins\workspace\perf_perflab_---69557540\tests\src\JIT\Performance\CodeQuality\Span\SpanBench.csproj] 15:20:38 SpanBench.cs(607,47): error CS1525: Invalid expression term 'ref' [C:\Jenkins\workspace\perf_perflab_---69557540\tests\src\JIT\Performance\CodeQuality\Span\SpanBench.csproj] 15:20:38 SpanBench.cs(607,47): error CS1003: Syntax error, ',' expected [C:\Jenkins\workspace\perf_perflab_---69557540\tests\src\JIT\Performance\CodeQuality\Span\SpanBench.csproj] 15:20:38 SpanBench.cs(607,51): error CS1002: ; expected [C:\Jenkins\workspace\perf_perflab_---69557540\tests\src\JIT\Performance\CodeQuality\Span\SpanBench.csproj]

DrewScoggins · 2017-02-26T23:33:31Z

+            var span = new Span<byte>(array);
+            foreach (var iteration in Benchmark.Iterations)
+                using (iteration.StartMeasurement())
+                    for (int i = 0; i < Benchmark.InnerIterationCount; i++)


This is a general comment for all of these, but how long are we running each test for? We have found on the perf team that having tests that run for a very short amount of time (milliseconds). We normally try and shoot for about .5-1 seconds.

The tests that were part of the pre-existing set are tuned that way in this change -- nominal times measured locally are 0.5 - 3 seconds. Not all at 1 second since the bubble and quick sorts are doing the same work and bubble sort is slower.

I actually reduced the time it takes for each test to run since 0.5-3 seconds seem way too much for running hundreds of tests. Even now it takes ~5 minutes to run the tests. If each iteration of the test takes 0.5 seconds (rather than say 0.5 ms), it would take a really long time to run these tests (especially after the tests I added).

DrewScoggins · 2017-02-26T23:47:29Z

I have a more general question. It seems like we have added a lot of new tests including many tests that do the same thing across different types and doing various actions across a large swath of sizes. I assume for the different types that each one exercises different code paths or because they have different sizes they could have different behaviors do to alignment issues. But it seems to me that we might be able to reduce the number of size variations we are using. Do we know how we expect the time to grow with the size? If we could pin that down we could then do a smaller set of sizes and just make sure that we are hitting those numbers.

ahsonkhan · 2017-03-01T21:25:56Z

It looks like this is not building. When I tried to do a perf run I am getting some syntax errors. Is this relying on some code that had not been checked in yet?

It requires VS 2017 and C# changes which support ref returns. I tested in VS 2017 developer command prompt and it builds fine. In VS 2015, I got those errors too.

ahsonkhan · 2017-03-01T21:31:44Z

I have a more general question. It seems like we have added a lot of new tests including many tests that do the same thing across different types and doing various actions across a large swath of sizes. I assume for the different types that each one exercises different code paths or because they have different sizes they could have different behaviors do to alignment issues. But it seems to me that we might be able to reduce the number of size variations we are using. Do we know how we expect the time to grow with the size? If we could pin that down we could then do a smaller set of sizes and just make sure that we are hitting those numbers.

In some cases, different code paths are being executed for different sizes along with data types (see example below). However, I believe I can reduce the number of test cases.

I could do 1, 10, 100, 1000, some large X, which gets us 5 instead of 8.

Example:
https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/Span.cs#L844

Looking at Span.Clear(), we have a case for <= 22, > 22, < 512 and >= 512, so we can't really do less than 3 different test cases. Having a case for size 1 and some large X provides other insights.

AndyAyersMS · 2017-03-01T22:07:25Z

Drew can probably help you sort this out, but there's not a simple relationship between the time required for an iteration and time required to run a benchmark. Short iteration times inspire xunit-perf to run more iterations overall. Look at the raw perf data and see how many iterations your tests are inspiring.

My opinion is you are better off aiming for larger iteration times and fewer overall iterations. Having ~1 second iterations typically leads to 11 iterations and so about 10 seconds per benchmark.

Maybe shorter overall benchmarks are appropriate for pure API tests? Not sure -- we don't have a lot of experience tracking the variance of very short run. We'd have to look at a few dozen runs worth of data to have a better idea.

ahsonkhan · 2017-03-01T22:33:15Z

Maybe shorter overall benchmarks are appropriate for pure API tests? Not sure -- we don't have a lot of experience tracking the variance of very short run. We'd have to look at a few dozen runs worth of data to have a better idea.

I would like to start with small runtime and increase the iterations as needed. I have had fairly consistent results even when the test time is relatively small.

My opinion is you are better off aiming for larger iteration times and fewer overall iterations. Having ~1 second iterations typically leads to 11 iterations and so about 10 seconds per benchmark.

Look at the raw perf data and see how many iterations your tests are inspiring.

What I have noticed is that if my BaseIteration is < 100 (say 1-10), the test runs 1001 iterations. when my Base Iteration is > 10000, it runs 100 iterations. Even at 100, it was still running 100 iterations. That is why I kept it at 100. I tried larger/smaller inner iterations and the total number of iterations for the test didn't go below 100. If for some really large inner iteration count we get 11 iterations, it doesn't become worth it since the runtime becomes too high, regardless.

but there's not a simple relationship between the time required for an iteration and time required to run a benchmark

I understand that. I was trying to reduce time required to run a benchmark. I tried to increase the time required for an iteration while until the time required to run the benchmark became too large.

Drew can probably help you sort this out.

@DrewScoggins thoughts?

DrewScoggins · 2017-03-02T01:05:30Z

So under the covers in xunit performance we have two caps that will stop the collection, number of iterations and time. The number of iterations is by default capped at 1000, and the amount of time is capped at 10 seconds. So if we ever exceed either of those values then we will stop running the benchmark. The number of iterations that can be used when you use the InnerIterations feature is set lower at I believe 100 total outer iterations.

So now we have two factors that make us want to have longer times per outer iteration. The first is purely operational. Benchview, the site we use to visualize performance data, is less responsive when every test has 1000 iterations. The second, and more important reason to increase the time per outer iteration, is for the stability of the generated data. In the past we have found that having outer iterations take between .5-1 second generate considerably more reliable numbers build to build. This still leaves us with the issue of long run times however.

To get that under control I still think that the best thing we can do it to try and trim down the number of total tests that we are running.

In some cases, different code paths are being executed for different sizes along with data types (see example below). However, I believe I can reduce the number of test cases.

I could do 1, 10, 100, 1000, some large X, which gets us 5 instead of 8.

Example:
https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/Span.cs#L844

Looking at Span.Clear(), we have a case for <= 22, > 22, < 512 and >= 512, so we can't really do less than 3 different test cases. Having a case for size 1 and some large X provides other insights.

Looking at this code it seems to me that we could easily get away with only 1, 10 and 1000. That would cover all of the cases in the code. If we wanted to include a very large value we could look into it, but I am not sure what value we get out of it. We can also remove the byte and Struct specific tests. From looking at Clear (I have not looked at Fill and others, but I suspect we could get wins there as well) we just take the size of the value type and multiply it by the length of the Span. There is nothing special about all of the different value types that we use so we should just delete them. We will need to keep String obviously as it is a ref type and uses a different code path.

If after doing that kind of work to reduce the number of tests we still think that we would be running to long if we were taking 1 second per iteration we have some stuff we can do. First, we can tune these tests to run somewhere between 50-100 milliseconds. This should give us a good balance between actual coverage and time. The second thing that we will need to do is setup xunit performance to only run a max of ten iterations. We have never had to deal with this before because we have always run for about 1 second and so we got the ten iterations for free from the timing restrictions, but we will need to set this up. It should not be difficult, but it would be perturbing the current lab setup that we have and would require possible rebaselining, so I would like to prevent this if at all possible.

DrewScoggins · 2017-03-02T01:08:28Z

It looks like this is not building. When I tried to do a perf run I am getting some syntax errors. Is this relying on some code that had not been checked in yet?

It requires VS 2017 and C# changes which support ref returns. I tested in VS 2017 developer command prompt and it builds fine. In VS 2015, I got those errors too.

We should not be checking in things that only work and build using VS 2017. We currently specify in the build instructions that you only need VS 2015

Visual Studio 2015 (Community, Professional, Enterprise). The community version is completely free.

ahsonkhan · 2017-03-02T01:15:25Z

We should not be checking in things that only work and build using VS 2017. We currently specify in the build instructions that you only need VS 2015

The change to the csproj that I just made should enable you to pickup the latest compiler now and it should compile in VS 2015 as well.

ahsonkhan · 2017-03-02T01:22:09Z

Looking at this code it seems to me that we could easily get away with only 1, 10 and 1000.

1, 100, 1000 sure, but not 10 (we need one for the case between 23-511). However, I think a general 1, 10, 100, 1000 benchmark test are good since it isn't tied too closely to the implementation. Why are we trying to reduce the test cases and tuning them to the specific implementation?

We can also remove the byte and Struct specific tests.
We will need to keep String obviously as it is a ref type and uses a different code path.

Agreed, but I would rather remove int instead of byte since many of the scenarios that use Span use them as bytes.

DrewScoggins · 2017-03-02T01:34:32Z

Yeah, I have no preference on which of the value types we keep around to run on. Byte seems just as good as any to me.

I am trying to reduce the number of tests because each test that we add is an actual cost. Not only in machine time, but also in cognitive load when we are looking at results. So removing tests that test something else we have already tested is, in my opinion, just goodness and work we should do.

I am not trying to tie the tests to close to the implementation, but I do want us to think deeply about what tests we are adding, what coverage they provide that no other test does, and removing tests that don't meet that criteria.

ahsonkhan · 2017-03-02T02:52:22Z

I have increased the inner iteration count so all test iterations take > 1ms. The tests total runtime is over 7 minutes now.

SpanBench Total: 124, Errors: 0, Failed: 0, Skipped: 0, Time: 471.461s

…progress (dotnet#9790) * wip * Adding Span API perf tests for comparison with arrays and to monitor progress * Reducing iteration count so tests finish in < 2 minutes * Reducing the number of test cases * Commenting out ref DangerousGetPinnableReference test * Updated csproj to use latest compiler * Tuning test cases and runtime

…progress (dotnet/coreclr#9790) * wip * Adding Span API perf tests for comparison with arrays and to monitor progress * Reducing iteration count so tests finish in < 2 minutes * Reducing the number of test cases * Commenting out ref DangerousGetPinnableReference test * Updated csproj to use latest compiler * Tuning test cases and runtime Commit migrated from dotnet/coreclr@1bf3bbb

dnfclas added the cla-already-signed label Feb 25, 2017

ahsonkhan added 3 commits February 25, 2017 02:05

wip

804ccb9

Adding Span API perf tests for comparison with arrays and to monitor …

f55346a

…progress

Reducing iteration count so tests finish in < 2 minutes

cbc6187

ahsonkhan force-pushed the SpanAPITests branch from 83b2c7d to cbc6187 Compare February 25, 2017 10:05

AndyAyersMS mentioned this pull request Feb 25, 2017

Fix SpanBench #9783

Closed

DrewScoggins reviewed Feb 26, 2017

View reviewed changes

Reducing the number of test cases

8a93c09

ahsonkhan added 2 commits March 1, 2017 14:46

Commenting out ref DangerousGetPinnableReference test

4bb7582

Updated csproj to use latest compiler

530efaf

Tuning test cases and runtime

f98c3dd

ahsonkhan merged commit 1bf3bbb into dotnet:master Mar 2, 2017

ahsonkhan deleted the SpanAPITests branch March 2, 2017 04:32

karelz modified the milestone: 2.0.0 Aug 28, 2017

Conversation

ahsonkhan commented Feb 25, 2017

Uh oh!

DrewScoggins commented Feb 26, 2017

Uh oh!

DrewScoggins Feb 26, 2017

Choose a reason for hiding this comment

Uh oh!

DrewScoggins Feb 26, 2017

Choose a reason for hiding this comment

Uh oh!

AndyAyersMS Feb 27, 2017

Choose a reason for hiding this comment

Uh oh!

DrewScoggins Feb 27, 2017

Choose a reason for hiding this comment

Uh oh!

DrewScoggins commented Feb 26, 2017

Uh oh!

DrewScoggins Feb 26, 2017

Choose a reason for hiding this comment

Uh oh!

AndyAyersMS Feb 27, 2017

Choose a reason for hiding this comment

Uh oh!

ahsonkhan Mar 1, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DrewScoggins commented Feb 26, 2017

Uh oh!

ahsonkhan commented Mar 1, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahsonkhan commented Mar 1, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AndyAyersMS commented Mar 1, 2017

Uh oh!

ahsonkhan commented Mar 1, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DrewScoggins commented Mar 2, 2017

Uh oh!

DrewScoggins commented Mar 2, 2017

Uh oh!

ahsonkhan commented Mar 2, 2017

Uh oh!

ahsonkhan commented Mar 2, 2017

Uh oh!

DrewScoggins commented Mar 2, 2017

Uh oh!

ahsonkhan commented Mar 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ahsonkhan Mar 1, 2017 •

edited

Loading

ahsonkhan commented Mar 1, 2017 •

edited

Loading

ahsonkhan commented Mar 1, 2017 •

edited

Loading

ahsonkhan commented Mar 1, 2017 •

edited

Loading

ahsonkhan commented Mar 2, 2017 •

edited

Loading