Adding Span API perf tests for comparison with arrays and to monitor progress#9790
Conversation
83b2c7d to
cbc6187
Compare
|
test Windows_NT_x64 perf |
| TestQuickSortSpan(span); | ||
| for (int i = 0; i < QuickSortIterations; i++) | ||
| { | ||
| Array.Copy(unsortedData, data, Size); |
There was a problem hiding this comment.
Why are we doing the copy inside the timing loop? Seems like we would just want to be running the actual quicksort algorithm.
There was a problem hiding this comment.
Same comment for all sort tests.
There was a problem hiding this comment.
FYI, all these sort tests are part of the pre-existing span benchmark, which I'm responsible for.
You have to reset the array to the same initial unsorted state or it won't behave the same on all iterations. I don't see any harm in having the copy be part of the timed loop. We don't really intend to measure just the time sorting; the point of these tests is to contrast span based approaches with array based approaches (with the goal that eventually there is no penalty for using Span). As long as TestArray and TestSpan are doing the same work per iteration then the tests are measuring something interesting.
There was a problem hiding this comment.
OK that makes sense. I am just wondering if the copy time dominates the execution time (I suppose I should just profile and find out 😄).
|
It looks like this is not building. When I tried to do a perf run I am getting some syntax errors. Is this relying on some code that had not been checked in yet?
|
| var span = new Span<byte>(array); | ||
| foreach (var iteration in Benchmark.Iterations) | ||
| using (iteration.StartMeasurement()) | ||
| for (int i = 0; i < Benchmark.InnerIterationCount; i++) |
There was a problem hiding this comment.
This is a general comment for all of these, but how long are we running each test for? We have found on the perf team that having tests that run for a very short amount of time (milliseconds). We normally try and shoot for about .5-1 seconds.
There was a problem hiding this comment.
The tests that were part of the pre-existing set are tuned that way in this change -- nominal times measured locally are 0.5 - 3 seconds. Not all at 1 second since the bubble and quick sorts are doing the same work and bubble sort is slower.
There was a problem hiding this comment.
I actually reduced the time it takes for each test to run since 0.5-3 seconds seem way too much for running hundreds of tests. Even now it takes ~5 minutes to run the tests. If each iteration of the test takes 0.5 seconds (rather than say 0.5 ms), it would take a really long time to run these tests (especially after the tests I added).
|
I have a more general question. It seems like we have added a lot of new tests including many tests that do the same thing across different types and doing various actions across a large swath of sizes. I assume for the different types that each one exercises different code paths or because they have different sizes they could have different behaviors do to alignment issues. But it seems to me that we might be able to reduce the number of size variations we are using. Do we know how we expect the time to grow with the size? If we could pin that down we could then do a smaller set of sizes and just make sure that we are hitting those numbers. |
It requires VS 2017 and C# changes which support ref returns. I tested in VS 2017 developer command prompt and it builds fine. In VS 2015, I got those errors too. |
In some cases, different code paths are being executed for different sizes along with data types (see example below). However, I believe I can reduce the number of test cases. I could do 1, 10, 100, 1000, some large X, which gets us 5 instead of 8. Example: Looking at Span.Clear(), we have a case for <= 22, > 22, < 512 and >= 512, so we can't really do less than 3 different test cases. Having a case for size 1 and some large X provides other insights. |
|
Drew can probably help you sort this out, but there's not a simple relationship between the time required for an iteration and time required to run a benchmark. Short iteration times inspire xunit-perf to run more iterations overall. Look at the raw perf data and see how many iterations your tests are inspiring. My opinion is you are better off aiming for larger iteration times and fewer overall iterations. Having ~1 second iterations typically leads to 11 iterations and so about 10 seconds per benchmark. Maybe shorter overall benchmarks are appropriate for pure API tests? Not sure -- we don't have a lot of experience tracking the variance of very short run. We'd have to look at a few dozen runs worth of data to have a better idea. |
I would like to start with small runtime and increase the iterations as needed. I have had fairly consistent results even when the test time is relatively small.
What I have noticed is that if my BaseIteration is < 100 (say 1-10), the test runs 1001 iterations. when my Base Iteration is > 10000, it runs 100 iterations. Even at 100, it was still running 100 iterations. That is why I kept it at 100. I tried larger/smaller inner iterations and the total number of iterations for the test didn't go below 100. If for some really large inner iteration count we get 11 iterations, it doesn't become worth it since the runtime becomes too high, regardless.
I understand that. I was trying to reduce time required to run a benchmark. I tried to increase the time required for an iteration while until the time required to run the benchmark became too large.
@DrewScoggins thoughts? |
|
So under the covers in xunit performance we have two caps that will stop the collection, number of iterations and time. The number of iterations is by default capped at 1000, and the amount of time is capped at 10 seconds. So if we ever exceed either of those values then we will stop running the benchmark. The number of iterations that can be used when you use the InnerIterations feature is set lower at I believe 100 total outer iterations. So now we have two factors that make us want to have longer times per outer iteration. The first is purely operational. Benchview, the site we use to visualize performance data, is less responsive when every test has 1000 iterations. The second, and more important reason to increase the time per outer iteration, is for the stability of the generated data. In the past we have found that having outer iterations take between .5-1 second generate considerably more reliable numbers build to build. This still leaves us with the issue of long run times however. To get that under control I still think that the best thing we can do it to try and trim down the number of total tests that we are running.
Looking at this code it seems to me that we could easily get away with only 1, 10 and 1000. That would cover all of the cases in the code. If we wanted to include a very large value we could look into it, but I am not sure what value we get out of it. We can also remove the byte and Struct specific tests. From looking at Clear (I have not looked at Fill and others, but I suspect we could get wins there as well) we just take the size of the value type and multiply it by the length of the Span. There is nothing special about all of the different value types that we use so we should just delete them. We will need to keep String obviously as it is a ref type and uses a different code path. If after doing that kind of work to reduce the number of tests we still think that we would be running to long if we were taking 1 second per iteration we have some stuff we can do. First, we can tune these tests to run somewhere between 50-100 milliseconds. This should give us a good balance between actual coverage and time. The second thing that we will need to do is setup xunit performance to only run a max of ten iterations. We have never had to deal with this before because we have always run for about 1 second and so we got the ten iterations for free from the timing restrictions, but we will need to set this up. It should not be difficult, but it would be perturbing the current lab setup that we have and would require possible rebaselining, so I would like to prevent this if at all possible. |
We should not be checking in things that only work and build using VS 2017. We currently specify in the build instructions that you only need VS 2015
|
The change to the csproj that I just made should enable you to pickup the latest compiler now and it should compile in VS 2015 as well. |
1, 100, 1000 sure, but not 10 (we need one for the case between 23-511). However, I think a general 1, 10, 100, 1000 benchmark test are good since it isn't tied too closely to the implementation. Why are we trying to reduce the test cases and tuning them to the specific implementation?
Agreed, but I would rather remove int instead of byte since many of the scenarios that use Span use them as bytes. |
|
Yeah, I have no preference on which of the value types we keep around to run on. Byte seems just as good as any to me. I am trying to reduce the number of tests because each test that we add is an actual cost. Not only in machine time, but also in cognitive load when we are looking at results. So removing tests that test something else we have already tested is, in my opinion, just goodness and work we should do. I am not trying to tie the tests to close to the implementation, but I do want us to think deeply about what tests we are adding, what coverage they provide that no other test does, and removing tests that don't meet that criteria. |
|
I have increased the inner iteration count so all test iterations take > 1ms. The tests total runtime is over 7 minutes now. SpanBench Total: 124, Errors: 0, Failed: 0, Skipped: 0, Time: 471.461s |
…progress (dotnet#9790) * wip * Adding Span API perf tests for comparison with arrays and to monitor progress * Reducing iteration count so tests finish in < 2 minutes * Reducing the number of test cases * Commenting out ref DangerousGetPinnableReference test * Updated csproj to use latest compiler * Tuning test cases and runtime
…progress (dotnet/coreclr#9790) * wip * Adding Span API perf tests for comparison with arrays and to monitor progress * Reducing iteration count so tests finish in < 2 minutes * Reducing the number of test cases * Commenting out ref DangerousGetPinnableReference test * Updated csproj to use latest compiler * Tuning test cases and runtime Commit migrated from dotnet/coreclr@1bf3bbb
No description provided.