Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Adding Span API perf tests for comparison with arrays and to monitor progress#9790

Merged
ahsonkhan merged 7 commits into
dotnet:masterfrom
ahsonkhan:SpanAPITests
Mar 2, 2017
Merged

Adding Span API perf tests for comparison with arrays and to monitor progress#9790
ahsonkhan merged 7 commits into
dotnet:masterfrom
ahsonkhan:SpanAPITests

Conversation

@ahsonkhan
Copy link
Copy Markdown

No description provided.

@DrewScoggins
Copy link
Copy Markdown
Member

test Windows_NT_x64 perf

TestQuickSortSpan(span);
for (int i = 0; i < QuickSortIterations; i++)
{
Array.Copy(unsortedData, data, Size);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we doing the copy inside the timing loop? Seems like we would just want to be running the actual quicksort algorithm.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment for all sort tests.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, all these sort tests are part of the pre-existing span benchmark, which I'm responsible for.

You have to reset the array to the same initial unsorted state or it won't behave the same on all iterations. I don't see any harm in having the copy be part of the timed loop. We don't really intend to measure just the time sorting; the point of these tests is to contrast span based approaches with array based approaches (with the goal that eventually there is no penalty for using Span). As long as TestArray and TestSpan are doing the same work per iteration then the tests are measuring something interesting.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK that makes sense. I am just wondering if the copy time dominates the execution time (I suppose I should just profile and find out 😄).

@DrewScoggins
Copy link
Copy Markdown
Member

It looks like this is not building. When I tried to do a perf run I am getting some syntax errors. Is this relying on some code that had not been checked in yet?

-69557540\tests\src\JIT\Performance\CodeQuality\Span\SpanBench.csproj] 15:20:38 SpanBench.cs(544,41): error CS1525: Invalid expression term 'ref' [C:\Jenkins\workspace\perf_perflab_---69557540\tests\src\JIT\Performance\CodeQuality\Span\SpanBench.csproj] 15:20:38 SpanBench.cs(544,41): error CS1003: Syntax error, ',' expected [C:\Jenkins\workspace\perf_perflab_---69557540\tests\src\JIT\Performance\CodeQuality\Span\SpanBench.csproj] 15:20:38 SpanBench.cs(544,45): error CS1002: ; expected [C:\Jenkins\workspace\perf_perflab_---69557540\tests\src\JIT\Performance\CodeQuality\Span\SpanBench.csproj] 15:20:38 SpanBench.cs(564,22): error CS1513: } expected [C:\Jenkins\workspace\perf_perflab_---69557540\tests\src\JIT\Performance\CodeQuality\Span\SpanBench.csproj] 15:20:38 SpanBench.cs(565,40): error CS1525: Invalid expression term 'ref' [C:\Jenkins\workspace\perf_perflab_---69557540\tests\src\JIT\Performance\CodeQuality\Span\SpanBench.csproj] 15:20:38 SpanBench.cs(565,40): error CS1003: Syntax error, ',' expected [C:\Jenkins\workspace\perf_perflab_---69557540\tests\src\JIT\Performance\CodeQuality\Span\SpanBench.csproj] 15:20:38 SpanBench.cs(565,44): error CS1002: ; expected [C:\Jenkins\workspace\perf_perflab_---69557540\tests\src\JIT\Performance\CodeQuality\Span\SpanBench.csproj] 15:20:38 SpanBench.cs(585,22): error CS1513: } expected [C:\Jenkins\workspace\perf_perflab_---69557540\tests\src\JIT\Performance\CodeQuality\Span\SpanBench.csproj] 15:20:38 SpanBench.cs(586,43): error CS1525: Invalid expression term 'ref' [C:\Jenkins\workspace\perf_perflab_---69557540\tests\src\JIT\Performance\CodeQuality\Span\SpanBench.csproj] 15:20:38 SpanBench.cs(586,43): error CS1003: Syntax error, ',' expected [C:\Jenkins\workspace\perf_perflab_---69557540\tests\src\JIT\Performance\CodeQuality\Span\SpanBench.csproj] 15:20:38 SpanBench.cs(586,47): error CS1002: ; expected [C:\Jenkins\workspace\perf_perflab_---69557540\tests\src\JIT\Performance\CodeQuality\Span\SpanBench.csproj] 15:20:38 SpanBench.cs(606,22): error CS1513: } expected [C:\Jenkins\workspace\perf_perflab_---69557540\tests\src\JIT\Performance\CodeQuality\Span\SpanBench.csproj] 15:20:38 SpanBench.cs(607,47): error CS1525: Invalid expression term 'ref' [C:\Jenkins\workspace\perf_perflab_---69557540\tests\src\JIT\Performance\CodeQuality\Span\SpanBench.csproj] 15:20:38 SpanBench.cs(607,47): error CS1003: Syntax error, ',' expected [C:\Jenkins\workspace\perf_perflab_---69557540\tests\src\JIT\Performance\CodeQuality\Span\SpanBench.csproj] 15:20:38 SpanBench.cs(607,51): error CS1002: ; expected [C:\Jenkins\workspace\perf_perflab_---69557540\tests\src\JIT\Performance\CodeQuality\Span\SpanBench.csproj]

var span = new Span<byte>(array);
foreach (var iteration in Benchmark.Iterations)
using (iteration.StartMeasurement())
for (int i = 0; i < Benchmark.InnerIterationCount; i++)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a general comment for all of these, but how long are we running each test for? We have found on the perf team that having tests that run for a very short amount of time (milliseconds). We normally try and shoot for about .5-1 seconds.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests that were part of the pre-existing set are tuned that way in this change -- nominal times measured locally are 0.5 - 3 seconds. Not all at 1 second since the bubble and quick sorts are doing the same work and bubble sort is slower.

Copy link
Copy Markdown
Author

@ahsonkhan ahsonkhan Mar 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually reduced the time it takes for each test to run since 0.5-3 seconds seem way too much for running hundreds of tests. Even now it takes ~5 minutes to run the tests. If each iteration of the test takes 0.5 seconds (rather than say 0.5 ms), it would take a really long time to run these tests (especially after the tests I added).

@DrewScoggins
Copy link
Copy Markdown
Member

I have a more general question. It seems like we have added a lot of new tests including many tests that do the same thing across different types and doing various actions across a large swath of sizes. I assume for the different types that each one exercises different code paths or because they have different sizes they could have different behaviors do to alignment issues. But it seems to me that we might be able to reduce the number of size variations we are using. Do we know how we expect the time to grow with the size? If we could pin that down we could then do a smaller set of sizes and just make sure that we are hitting those numbers.

@ahsonkhan
Copy link
Copy Markdown
Author

ahsonkhan commented Mar 1, 2017

It looks like this is not building. When I tried to do a perf run I am getting some syntax errors. Is this relying on some code that had not been checked in yet?

It requires VS 2017 and C# changes which support ref returns. I tested in VS 2017 developer command prompt and it builds fine. In VS 2015, I got those errors too.

@ahsonkhan
Copy link
Copy Markdown
Author

ahsonkhan commented Mar 1, 2017

I have a more general question. It seems like we have added a lot of new tests including many tests that do the same thing across different types and doing various actions across a large swath of sizes. I assume for the different types that each one exercises different code paths or because they have different sizes they could have different behaviors do to alignment issues. But it seems to me that we might be able to reduce the number of size variations we are using. Do we know how we expect the time to grow with the size? If we could pin that down we could then do a smaller set of sizes and just make sure that we are hitting those numbers.

In some cases, different code paths are being executed for different sizes along with data types (see example below). However, I believe I can reduce the number of test cases.

I could do 1, 10, 100, 1000, some large X, which gets us 5 instead of 8.

Example:
https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/Span.cs#L844

Looking at Span.Clear(), we have a case for <= 22, > 22, < 512 and >= 512, so we can't really do less than 3 different test cases. Having a case for size 1 and some large X provides other insights.

@AndyAyersMS
Copy link
Copy Markdown
Member

Drew can probably help you sort this out, but there's not a simple relationship between the time required for an iteration and time required to run a benchmark. Short iteration times inspire xunit-perf to run more iterations overall. Look at the raw perf data and see how many iterations your tests are inspiring.

My opinion is you are better off aiming for larger iteration times and fewer overall iterations. Having ~1 second iterations typically leads to 11 iterations and so about 10 seconds per benchmark.

Maybe shorter overall benchmarks are appropriate for pure API tests? Not sure -- we don't have a lot of experience tracking the variance of very short run. We'd have to look at a few dozen runs worth of data to have a better idea.

@ahsonkhan
Copy link
Copy Markdown
Author

ahsonkhan commented Mar 1, 2017

Maybe shorter overall benchmarks are appropriate for pure API tests? Not sure -- we don't have a lot of experience tracking the variance of very short run. We'd have to look at a few dozen runs worth of data to have a better idea.

I would like to start with small runtime and increase the iterations as needed. I have had fairly consistent results even when the test time is relatively small.

My opinion is you are better off aiming for larger iteration times and fewer overall iterations. Having ~1 second iterations typically leads to 11 iterations and so about 10 seconds per benchmark.

Look at the raw perf data and see how many iterations your tests are inspiring.

What I have noticed is that if my BaseIteration is < 100 (say 1-10), the test runs 1001 iterations. when my Base Iteration is > 10000, it runs 100 iterations. Even at 100, it was still running 100 iterations. That is why I kept it at 100. I tried larger/smaller inner iterations and the total number of iterations for the test didn't go below 100. If for some really large inner iteration count we get 11 iterations, it doesn't become worth it since the runtime becomes too high, regardless.

but there's not a simple relationship between the time required for an iteration and time required to run a benchmark

I understand that. I was trying to reduce time required to run a benchmark. I tried to increase the time required for an iteration while until the time required to run the benchmark became too large.

Drew can probably help you sort this out.

@DrewScoggins thoughts?

@DrewScoggins
Copy link
Copy Markdown
Member

So under the covers in xunit performance we have two caps that will stop the collection, number of iterations and time. The number of iterations is by default capped at 1000, and the amount of time is capped at 10 seconds. So if we ever exceed either of those values then we will stop running the benchmark. The number of iterations that can be used when you use the InnerIterations feature is set lower at I believe 100 total outer iterations.

So now we have two factors that make us want to have longer times per outer iteration. The first is purely operational. Benchview, the site we use to visualize performance data, is less responsive when every test has 1000 iterations. The second, and more important reason to increase the time per outer iteration, is for the stability of the generated data. In the past we have found that having outer iterations take between .5-1 second generate considerably more reliable numbers build to build. This still leaves us with the issue of long run times however.

To get that under control I still think that the best thing we can do it to try and trim down the number of total tests that we are running.

In some cases, different code paths are being executed for different sizes along with data types (see example below). However, I believe I can reduce the number of test cases.

I could do 1, 10, 100, 1000, some large X, which gets us 5 instead of 8.

Example:
https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/Span.cs#L844

Looking at Span.Clear(), we have a case for <= 22, > 22, < 512 and >= 512, so we can't really do less than 3 different test cases. Having a case for size 1 and some large X provides other insights.

Looking at this code it seems to me that we could easily get away with only 1, 10 and 1000. That would cover all of the cases in the code. If we wanted to include a very large value we could look into it, but I am not sure what value we get out of it. We can also remove the byte and Struct specific tests. From looking at Clear (I have not looked at Fill and others, but I suspect we could get wins there as well) we just take the size of the value type and multiply it by the length of the Span. There is nothing special about all of the different value types that we use so we should just delete them. We will need to keep String obviously as it is a ref type and uses a different code path.

If after doing that kind of work to reduce the number of tests we still think that we would be running to long if we were taking 1 second per iteration we have some stuff we can do. First, we can tune these tests to run somewhere between 50-100 milliseconds. This should give us a good balance between actual coverage and time. The second thing that we will need to do is setup xunit performance to only run a max of ten iterations. We have never had to deal with this before because we have always run for about 1 second and so we got the ten iterations for free from the timing restrictions, but we will need to set this up. It should not be difficult, but it would be perturbing the current lab setup that we have and would require possible rebaselining, so I would like to prevent this if at all possible.

@DrewScoggins
Copy link
Copy Markdown
Member

It looks like this is not building. When I tried to do a perf run I am getting some syntax errors. Is this relying on some code that had not been checked in yet?

It requires VS 2017 and C# changes which support ref returns. I tested in VS 2017 developer command prompt and it builds fine. In VS 2015, I got those errors too.

We should not be checking in things that only work and build using VS 2017. We currently specify in the build instructions that you only need VS 2015

  • Visual Studio 2015 (Community, Professional, Enterprise). The community version is completely free.

@ahsonkhan
Copy link
Copy Markdown
Author

We should not be checking in things that only work and build using VS 2017. We currently specify in the build instructions that you only need VS 2015

The change to the csproj that I just made should enable you to pickup the latest compiler now and it should compile in VS 2015 as well.

@ahsonkhan
Copy link
Copy Markdown
Author

Looking at this code it seems to me that we could easily get away with only 1, 10 and 1000.

1, 100, 1000 sure, but not 10 (we need one for the case between 23-511). However, I think a general 1, 10, 100, 1000 benchmark test are good since it isn't tied too closely to the implementation. Why are we trying to reduce the test cases and tuning them to the specific implementation?

We can also remove the byte and Struct specific tests.
We will need to keep String obviously as it is a ref type and uses a different code path.

Agreed, but I would rather remove int instead of byte since many of the scenarios that use Span use them as bytes.

@DrewScoggins
Copy link
Copy Markdown
Member

Yeah, I have no preference on which of the value types we keep around to run on. Byte seems just as good as any to me.

I am trying to reduce the number of tests because each test that we add is an actual cost. Not only in machine time, but also in cognitive load when we are looking at results. So removing tests that test something else we have already tested is, in my opinion, just goodness and work we should do.

I am not trying to tie the tests to close to the implementation, but I do want us to think deeply about what tests we are adding, what coverage they provide that no other test does, and removing tests that don't meet that criteria.

@ahsonkhan
Copy link
Copy Markdown
Author

ahsonkhan commented Mar 2, 2017

I have increased the inner iteration count so all test iterations take > 1ms. The tests total runtime is over 7 minutes now.

SpanBench Total: 124, Errors: 0, Failed: 0, Skipped: 0, Time: 471.461s

@ahsonkhan ahsonkhan merged commit 1bf3bbb into dotnet:master Mar 2, 2017
@ahsonkhan ahsonkhan deleted the SpanAPITests branch March 2, 2017 04:32
jorive pushed a commit to guhuro/coreclr that referenced this pull request May 4, 2017
…progress (dotnet#9790)

* wip

* Adding Span API perf tests for comparison with arrays and to monitor progress

* Reducing iteration count so tests finish in < 2 minutes

* Reducing the number of test cases

* Commenting out ref DangerousGetPinnableReference test

* Updated csproj to use latest compiler

* Tuning test cases and runtime
@karelz karelz modified the milestone: 2.0.0 Aug 28, 2017
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…progress (dotnet/coreclr#9790)

* wip

* Adding Span API perf tests for comparison with arrays and to monitor progress

* Reducing iteration count so tests finish in < 2 minutes

* Reducing the number of test cases

* Commenting out ref DangerousGetPinnableReference test

* Updated csproj to use latest compiler

* Tuning test cases and runtime


Commit migrated from dotnet/coreclr@1bf3bbb
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants