More efficient ParseNumber & MatchChars#3163
Conversation
ParseNumber & related more efficient
|
@varocarbas, Thanks for signing the contribution license agreement so quickly! Actual humans will now validate the agreement and then evaluate the PR. |
|
@varocarbas, Thanks for signing the contribution license agreement so quickly! Actual humans will now validate the agreement and then evaluate the PR. |
|
@jkotas do you want to take a look at this one? Do we have thorough tests for this area already do you know? |
|
Number parsing and formatting should be covered by corefx tests. It would be a really good idea to run them before this is merged. Instructions on how to run them are at https://github.com/dotnet/coreclr/blob/master/Documentation/building/testing-with-corefx.md |
|
@varocarbas Would you mind running the corefx tests on this change? |
|
@jkotas Sure, no problem. I will do it within the next days. |
|
@jkotas I have finished the tests and wrote down my conclusions here. The summary is that no errors were found (logically) and that both versions showed too variable behaviours. Apparently, relying on CoreRun.exe (required to perform the CoreFX tests) provokes a high instability which makes very difficult to perform accurate measurements for so quick actions. This is precisely the reason why I spent some time to come up with a good enough testing framework where both versions could deliver a more or less consistent performance. |
|
@varocarbas I have read your long story about how you made this change. Thank you for putting a lot of energy into it, and running the functional corefx tests. But it is not fully clear to me from your write up what the change is actually improving. Could you please whip a simple 10 line program that calls public .NET number parsing API (e.g. million times in a loop) that runs measurably faster after your change? #6645 is a good example of performance improvement PR description. |
Sorry about that. Note that this was my first submission here and took advantage from the event, but I do understand that it became a bit too thick. Also bear in mind that the code being improved is highly optimised already and the performed actions are extremely quick.
Note that these functions deal with all the number parsing actions from all the numeric types (i.e., quite a few different scenarios). In fact the CoreFX test which I have linked seems to be quite throughful; but as said, for so small variations, these conditions aren't too unstable. Anyway... I will re-implement a simple enough approach on the lines which you are proposing and deliver some global results. The measurements might not be too reliable (now 5% and then 10% and then 2%, etc.) but should prove that my approach is undoubtedly better. |
|
I have tried the keep-it-simple approach but is certainly not easy. Here you have a very simple version of some of my test codes: static unsafe void Simple(int type)
{
Stopwatch sw = new Stopwatch();
sw.Start();
int count = 0;
int max = 50000;
while (count < max)
{
count++;
string input = count.ToString() + "." + count.ToString();
fixed (char* p = input)
{
char* p2 = p;
var val =
(
type == 0 ? New.ParseNumber(ref p2, NumberStyles.Any, null, true) :
Old.ParseNumber(ref p2, NumberStyles.Any, null, true)
);
}
}
sw.Stop();
Console.WriteLine(sw.Elapsed.TotalSeconds.ToString());
Console.ReadLine();
}
But even under ideal conditions, the differences with respect to the original version aren't too big and that's why caring so much about the testing environment. For example: if I run this code in my computer by alternating old/new versions, I might get something like (new-old) 1.76-1.77, 1.77-1.85, 1.79-1.80, etc. Bringing all the inestability associated with calling public APIs (I understand using CoreRun.exe) makes these results even worse. Increasing the number of interations doesn't help (not to get consistent outputs; some times goes in one direction and others in the contrary one). I have tried many different configurations (explained in the long-story tests) and come up with reliable testing conditions: comprehensive enough, but also likely to deliver equivalent differences new-old when being executed in virtually any situation. See, when I started all this I wasn't sure about how to proceed/what to expect here. I just took the first part which I saw with potential and started working on it. I am certainly happy with the result because I did optimise a highly-optimised code, spent some time on coming up with a quite reliable testing framework under so difficult conditions and documented the whole process. But I also understand that all this might be a bit outside your expectations and you might not be willing to spend the required time to understand all this properly. I can work on the test applications to simplify them a bit. I can even make them call |
|
You test is testing the performance of number parsing together with multiple other things: It is converting two numbers to a string, contacting 3 strings together, and finally parsing a number. The test is likely dominated by memory allocation that's why you have hard time to see the difference; and that's why you see variability in your numbers - background GC scheduling differs from run to run, etc. Also, it is important that for your test to be testing the public parsing API, not some internal method. The performance of the public parsing APIs is what matters. Could you please update your test to measure performance of number parsing only, and using public API? |
This is just a random sample to help understand the problem (i.e., many different inputs). I have tried many different configurations and, as said, the final versions of my testing applications deliver a very stable performance (i.e., equivalent differences new/old when executed at different points in the same computer).
As said, the problem is that the way to communicate to that API (i.e., using CoreRun.exe) provokes a huge instability. It is much slower than the actual call (e.g.,
Number parsing is what delivers In summary, if I did spend some time on creating a reliable enough testing framework, during that analysis, I drew some conclusions and, what is even more important, I did come up with a reliable and comprehensive enough testing approach, what would be the point of doing the opposite than what that work advises? I can build a new testing app (by applying the conclusions from the aforementioned work) with instructions addressing all your concerns. I can even create another version calling to CoreRun.exe under exactly the same conditions and showing the referred instability. In any case, you have to understand the maximum you can get here: under properly-testing conditions, reaching a consistent improvement above 5% would be excellent news. This is a complex improvement which do need a complex analysis and I do accept the eventuality of you not being interested in it. Unless you don't want me to do so, I will work on the aforementioned more-clear-but-reliable-enough testing approach which hopefully will be good for you. I will write you back in some days. |
|
Optimizing internal functions in isolation is fine strategy as you are working on an improvement. The ultimate performance improvement has to be measured using public APIs. The public APIs is what the .NET programs out there call, and what matters to .NET users. I understand that the public parsing APIs take number of arguments - to demonstrate benefit of your change, just pick one where the improvement is most significant to start with. It is pretty normal to see instability during performance testing. The key is to modify you testing methodology to eliminate it, or filter it out. I do not think it has anything to do with corerun. We are interested in performance improvements, but they have to be supported by right data. We believe that good performance engineering has to be data-driven. |
Note that the code being modified is part of The CoreFX tests have also some reliance on this process in order to account for the last mscorlib.dll version, I guess; but this situation seems different (it might even be stable enough; I didn't spend too much on it). The problem is that there is no CoreFX test referring to the CoreCLR code (at least, I couldn't find any). That's why I chose So, if I want to do a proper validation with the public API, I would have to recompile mscorlib and rely on CoreRun.exe to run my sample (what, as per my experiences, has a quite negative impact). Unless there is another option which I don't know. Anyway, I will think about the best way to account for all these issues and post a hopefully-appealing-to-you test. It will still take me some days though. |
Yes, that's the right way to do it. cc @jamesqo in case he has something to add about how to do proper testing of perf improvements. |
|
@varocarbas I couldn't figure out from reading your comments how you are running perf tests. But if you are having trouble, here are the steps I normally follow:
static unsafe void Simple()
{
int count = 0;
int max = 1000000;
var strings = Enumerable.Range(1, max).Select(c => c + "." + c).ToArray();
Stopwatch sw = new Stopwatch();
sw.Start();
while (count < max)
{
string input = strings[count];
count++;
fixed (char* p = input)
{
char* p2 = p;
var val = PublicNumberParsingApiThatCallsYourMethod();
}
}
sw.Stop();
Console.WriteLine(sw.Elapsed.TotalSeconds);
Console.ReadLine();
}
{
"buildOptions": {
"allowUnsafe": true,
"debugType": "portable",
"emitEntryPoint": true
},
"dependencies": {},
"frameworks": {
"netcoreapp1.0": {
"dependencies": {
"Microsoft.NETCore.App": "1.0.0"
},
"imports": "dnxcore50"
}
},
"runtimes": {
"win10-x64": {}
}
}
|
|
@jkotas We should consider adding documentation for perf testing later. I myself was very confused about how to use the build mscorlib until benaadams enlightened me with https://github.com/dotnet/coreclr/issues/4678#issuecomment-215638533 |
|
@jamesqo Thanks. Very helpful. |
|
The documentation on how to use coreclr with your local modifications - quite similar to the ones posted by @jamesqo above - is at https://github.com/dotnet/coreclr/blob/master/Documentation/workflow/UsingYourBuild.md . They have been revamped recently. If there are parts that are confusing or need more details, please do suggest improvements. |
|
@jkotas I will look into all this in some days, but this isn't the approach which I tried in my first tests. Back then (8 months ago), the main documentation I found about testing mscorlib.dll didn't involve any calls to public APIs/Core/etc. It just included the steps to generate CoreRun.exe locally, together with wathever library (+ coreclr.dll + mscorlib.dll), and to execute the tests via command prompt ("CoreRun.exe test_file.exe"). Everything was very slow and unstable; that's why I removed CoreRun.exe from the equation and preferred to create all the code myself. Now things seem to be better. Excellent news! |
|
@jkotas A quick update to highlight that I will delay my tests a bit more than planned (perhaps within the next 2 weeks?!). Note that I did follow @jamesqo instructions, but found the problems described below. On the bright side, this new Core-based approach definitively seems much more reliable than the old CoreRun.exe-command-line one. Excellent news on this front. The not-so-good news is that the new approach also has similar problems than the old one regarding other-than-mscorlib libraries. More specifically, I haven't been able to find a way around an error in In any case, I am currently very busy, yesterday I spent on this more than planned (+ got a quite negative result, what isn't too motivating) and still have to come up with a good testing approach for the new conditions (+ perform the tests). As said, I will try to come back within the next 2 weeks. |
|
@varocarbas, it would probably help if you posted a screenshot of your problems, or just quoted the error message. It's easier for us to understand what's going on that way. |
|
@jamesqo Sorry about that. I thought that it was a well-known problem (as said, similar issues happened with the previous version; although I think that I was able to use I did everything exactly as instructed in your message by using the following ( using System;
namespace ConsoleApplication
{
public class Program
{
public static void Main(string[] args)
{
Console.WriteLine("anything");
}
}
}And After publishing the project, everything works as expected. I can execute the application in \bin\release\netcoreapp1.0\win7-x64\publish\app.exe without any problem. But when I replace the If I remove the As said, the difference between the sizes of the |
|
The instructions had a bug that lead to this problem - I have fixed it in #7648. |
|
@jkotas Thanks for the quick response. It did look like an incompatibility issue. I am not sure if I understood your modification right, but I set the path as instructured, republished/rerun and got the same error. Even copied all the DLLs in the \Tools\dotnetcli\ folder directly and didn't work either. In any case, I wasn't expecting to spend time on this now; neither you to fix the problems (although thanks again to both of you for a so quick response). I will look into all this in some days and decide what to do. |
|
@jkotas In general, I have observed that the original version shows a more unstable behaviour (mainly under very long testing conditions, not the case with the sample below), although the new one wasn't too stable either. I did measure the differences externally (didn't check the documentation again, but your original advice to fix the I used the same project.json file included in one of my previous comments and, for my last test, the following Program.cs: using System;
using System.Globalization;
namespace App
{
public class Program
{
public static void Main(string[] args)
{
string[] inputs = new string[]
{
"123456.789", "123456",
"985465871256.12", "123456e2",
"-999999999999.99", "0.000000000000000000000000000000001",
"9999999999999999999999999999999999.9", "4B414.D000000011613C3eAF",
"(523)", "999e5", "111111.5656487879825632303124",
"12347%", "$52", "52$", "asfasfas", "123esreesfaf532.231",
"99,99,999.55", "9,999.55"
};
decimal outVal = 0m;
int count = 0;
int max = 5000000;
while (count < max)
{
count++;
foreach (string input in inputs)
{
try
{
decimal.TryParse(input, NumberStyles.Any, CultureInfo.InvariantCulture, out outVal);
}
catch
{
}
}
}
}
}
}I copied all the contents of the published app to the folders "new" and "orig", replaced the mscorlib.dll (and others) with the corresponding versions and updated accordingly the paths in the measuring application, whose code is the following: using System;
using System.Collections.Generic;
using System.Diagnostics;
namespace Measure
{
class Program
{
static void Main(string[] args)
{
double[] times = new double[2];
int[] counts = new int[2];
string[] names = new string[] { "new", "orig" };
int maxCount = 2;
int count0 = 0;
int count = 0;
Random rand = new Random();
while (count0 < maxCount * 2)
{
count0++;
count = (rand.Next(0, 1000) % 2 == 0 ? 0 : 1);
if (counts[count] == maxCount)
{
count = (count == 0 ? 1 : 0);
}
counts[count]++;
times[count] += RunTest(names[count]);
}
Console.WriteLine("new total: " + times[0].ToString("0.00"));
Console.WriteLine("orig total: " + times[1].ToString("0.00"));
Console.ReadLine();
}
private static double RunTest(string name)
{
string file = @"ini_path\" + name + @"\bin\Release\netcoreapp1.0\win7-x64\publish\app.exe";
Stopwatch sw = new Stopwatch();
sw.Start();
Process process = Process.Start(file);
process.WaitForExit();
sw.Stop();
Console.WriteLine(name + ": " + sw.Elapsed.TotalSeconds.ToString("0.00"));
return sw.Elapsed.TotalSeconds;
}
}
}As you can see, measure.exe calls the corresponding app.exe versions in a random order certain number of times. In my last tests, I run this version of the application 3 times (concurrently) and got (new total vs. orig total): If I run them sequentially (without overlapping), the behaviour would have been different; also in a different computer; also in the same computer under different workload; even in the same computer under equivalent load just one minute later. In any case and independently upon the specific conditions, the overall behaviour will be something on these lines (i.e., the new approach being 1-2% better). Sorry again for the delay and for the not-too-simple output, but this seems the best I can do under the current conditions. |
|
Thank you! This is simple test I was looking for. I have modified it to not catch exceptions and to measure the time it takes directly in the Main method. I am seeing pretty consistent 10% improvement in this test on my machine. |
|
Opened dotnet/corert#2130 to get this ported to CoreRT. @varocarbas feel free to pick it up if you are interested. If not, that's fine too ... somebody else will take care of it. |
|
@jkotas 10%! Wow! I guess that your machine is notably more powerful than mine (since the first tests, I confirmed that the differences were more relevant in more powerful computers, but never saw anything above 7% in the original setup and 3% in this last one). Yes, I can take care of the new issue ASAP (quite busy now; perhaps by the next week). Just two clarifications: I was measuring times externally because your |
More efficient ParseNumber & MatchChars implementation for numeric values parsing Fixes #2130
Quicker but identical algorithm.
Tests, codes and explanations in http://varocarbas.com/open_net/improvements_final/ (& linked pages).