Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Improve Enum.TryParse performance. Fixes #70#81

Closed
Alexx999 wants to merge 2 commits into
dotnet:masterfrom
Alexx999:fix_issue_70
Closed

Improve Enum.TryParse performance. Fixes #70#81
Alexx999 wants to merge 2 commits into
dotnet:masterfrom
Alexx999:fix_issue_70

Conversation

@Alexx999
Copy link
Copy Markdown

@Alexx999 Alexx999 commented Feb 4, 2015

The idea here is that enum value is always integer, so it will surely fit into long if it's signed or into ulong if it's unsigned. Furthermore, string-to-enum code uses ulong to store both signed and unsigned results anyway, so there is no real need to make any difference between signed/unsigned other than handling negative numbers or cast to intermediate results.

Performance figures:
Since I was unable to run MeasureIt on CoreClr, not was I able to find System.dll to get Stopwatch this results was taken using DateTime.Now and are fairly approximate, but difference is so big that it is hard to make mistake here.

Measurements taken on i5-4670K CPU at 4.2GHz, Win 8.1, release build
Old:

Good run: 218ms
Bad run: 13657ms
Big numbers run: 10955ms
Small numbers run: 295ms
All letters run: 125ms

New:

Good run: 215ms
Bad run: 187ms
Big numbers run: 252ms
Small numbers run: 218ms
All letters run: 123ms

In worst case improvement are over 70 times, plus there is noticeable speedup of parsing numbers as such (something near 25% since we now do less operations)

Code I used to test this can be viewed as gist: https://gist.github.com/Alexx999/84640c2c31e4f5806cd1

Issues/Open questions:

I found one edge case with undefined behavior - more precisely, when you do supply value that overflows enum's backing data type. This case is not covered by tests in CoreFx, it is present in my test as "Case 13".
For example, imagine following code:

enum SmallEnum : byte
{}
bool result = Enum.TryParse(ulong.MaxValue.ToString(), out smallEnum);

In this case you'll get false with old code (and new code will parse it fine).

I believe this is inconsistent with how enums work in general - you can do this:

ulong bigValue = ulong.MaxValue;
SmallEnum smallEnum = (SmallEnum)bigValue;

and it will not fail. Even more than that - this behavior actually is covered by tests, as can be seen here:
https://github.com/dotnet/corefx/blob/master/src/System.Runtime/tests/System/Enum.cs#L290

So is it okay to have such behavior changes or it must be changed to perfectly mimic old behavior?

Fixes #70

@AlexGhiondea
Copy link
Copy Markdown

Thanks for the contribution.

Since this code is shared with desktop we need to make sure we don't break compatibility with it.

Cases where the behavior is different before/after the change are considered breaking changes and need to be mitigated.

There are a couple of ways we mitigate breaking changes:

  • the code can be if-def'ed to only impact CoreCLR
  • we can code the fix such that it behaves one way on older platforms (ie. current version of Desktop) but get the new behavior for newer platforms. (this is handled by/with the AppContext class).

For this specific case, you can probably special case the situation where the enum backing type is smaller than the value that you need to assign to it and in that case have the old behavior.

@Alexx999
Copy link
Copy Markdown
Author

Alexx999 commented Feb 5, 2015

OK, I got it - we play safe here :)
I've got one idea how to make it behave exactly like the old one without hurting performance too much, I'll make it work that way

@Alexx999
Copy link
Copy Markdown
Author

Alexx999 commented Feb 5, 2015

Implemented that change.
New performance figures:

Good run: 214ms
Bad run: 187ms
Big numbers run: 443ms
Small numbers run: 319ms
All letters run: 121ms

It is now a bit slower with large numbers (still 25 times faster than original) and has roughly equal performance with small numbers.

@AlexGhiondea
Copy link
Copy Markdown

Thanks for the contribuition. I am going to run some validation and if everything is ok I'll merge this in.

@nguerrera
Copy link
Copy Markdown

Note that bool is a valid enum underlying type. How does this change interact with that?

e.g. Given:

.class public sealed PeculiarEnum extends [mscorlib]System.Enum
{
  .field public specialname rtspecialname bool value__
  .field public static literal valuetype PeculiarEnum A = bool(false)
  .field public static literal valuetype PeculiarEnum B = bool(true)
}

Does the behavior of Enum.Parse("0", typeof(PeculiarEnum)) change?

At a glance, I suspect it would have thrown FormatException before, but return PeculiarEnum.A now.

@Alexx999
Copy link
Copy Markdown
Author

Alexx999 commented Feb 6, 2015

@nguerrera yeah, you're right.
I've updated test with this particular case. And I think we should update Enum tests in CoreFx too, because they don't cover this edge cases right now.

I think this particular case can be worked around with additional check.

Good run: 213ms
Bad run: 182ms
Big numbers run: 482ms
Small numbers run: 356ms
All letters run: 122ms

PS: Turns out there is similar issue with char. This is harder than I'd expected...

@Alexx999
Copy link
Copy Markdown
Author

Alexx999 commented Feb 6, 2015

Updated test to check if char-based enum works correctly.
It turns out that in code

Enum.Parse("0", typeof(CharEnum))

"0" was treated like a char and not like number, so the result must be

Enum.ToObject(typeof(CharEnum), '0')

and it was equal to

Enum.ToObject(typeof(CharEnum), 0)

Now new code basically mimics behavior of Convert.ChangeType approach, but without throwing.
As a bonus it's now faster compared to previous solution with extra checks - it had performance regression is "small numbers" test

Good run: 214ms
Bad run: 203ms
Big numbers run: 247ms
Small numbers run: 229ms
All letters run: 124ms

Comment thread src/mscorlib/src/System/Enum.cs Outdated
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason for switching these statements?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that wasn't intended.

@AlexGhiondea
Copy link
Copy Markdown

Overall the change looks good. I left some comments in the PR.
Could you squash the commits into a single one?

If nothing comes out of the test run I am doing I'll merge this in!

@Alexx999
Copy link
Copy Markdown
Author

Alexx999 commented Feb 6, 2015

@AlexGhiondea OK, I cleaned up remaining things and squashed it to one commit.

@AlexGhiondea
Copy link
Copy Markdown

Thanks for all the work you did on this change. I have kicked off a different test run and should have results on Monday. The test run I did didn't show up any issues.

However, I talked with @jkotas about this and, to be on the safe side, we should update the code so that in the default case of the switch statement we fall back to the slow behavior we had before. This would cover the bool/float/double case.

Can you also check what happens when we have an enum backed by a float/double? I know you cannot create these in C#, but via IL or C++ they could still be generated and we need to make sure we can correctly handle them.

Again, thank you for doing the due diligence around this fix.

@Alexx999
Copy link
Copy Markdown
Author

Alexx999 commented Feb 7, 2015

@AlexGhiondea Indeed, @jkotas is right - there is possibility of having enum backed by float/double/IntPtr/UIntPtr.
This can be seen here:

switch (InternalGetCorElementType())

But on the same time ToObject() method doesn't support any of those types (as opposed to bool and char - they're actually supported) so parsing it will fail anyway, see
switch (typeCode)

Actually, I was able to find regression with some uber-tricky enums that look like:

enum UberTricky : double
{
   1 = 0, 0 = 1
}
Enum.Parse(typeof(UberTricky), "0");

It throws argument exception with original code and will parse fine now (as string and not by value).
I've updated test in my gist to cover this particular case.

@Alexx999
Copy link
Copy Markdown
Author

Alexx999 commented Feb 7, 2015

Okay, added one more fixup that will ensure correct behavior in all cases - it will throw ArgumentException with same message as before.
Performance seems to be unaffected, at least difference is less than measurement error.

Good run: 212ms
Bad run: 204ms
Big numbers run: 245ms
Small numbers run: 237ms
All letters run: 129ms

@AlexGhiondea
Copy link
Copy Markdown

@Alexx999 I would like to thank you for the work you put into this PR.

We have updated our contribution guidelines to better call out the bar for accepting pull requests. You can find it here: https://github.com/dotnet/coreclr/wiki/Contribution-guidelines

We are aware of the work you put into this PR and we truly appreciate it! But, unfortunately, since it is not a mainline scenario and has unknown risk, this PR does not meet the bar.

We are sorry for not being able to accept it and hope you will continue making contributions!

Thank you again for your contribution!

@phrohdoh
Copy link
Copy Markdown

It is quite sad to see a PR like this being discarded after all of the work put forward here.

@Alexx999
Copy link
Copy Markdown
Author

that's a bit of surprise - it just seemed like all possible cases are covered at last.

but probably you are right - this case is not really common.

@AlexGhiondea
Copy link
Copy Markdown

@phrohdoh I know.

One thing I would like to do is, with @Alexx999's permission, capture the tests in the CoreFX repo (I can make the change or review a PR).
That way, if we ever touch that area in the future we have a better understanding of the tricky parts.

@weshaggard
Copy link
Copy Markdown
Member

@phrohdoh I agree it is unfortunate as well and I also appreciate the effort that @Alexx999 put into this and while we were having the conversation on the PR we were also having a conversation internally to try and figure out the appropriate bar, hopefully that is a little clearer now. In this case while it would be nice to take the reward isn't worth the potential risk.

Thanks again @alex999 and I look forward to your future contributions.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parsing string to enum with digit as first char takes a very long time

5 participants