-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Clean up usage of string.IndexOf / ToUpper / ToLower / Trim throughout the framework #31968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up usage of string.IndexOf / ToUpper / ToLower / Trim throughout the framework #31968
Conversation
- Also optimize static field TextInfo.Invariant - Rewrite ToUpper/Lower(Invariant) call sites to use optimized method
- Also uses Starts/EndsWith(char) if available
- Also uses IndexOf(char) if available
src/libraries/System.Configuration.ConfigurationManager/src/System/Configuration/IdnElement.cs
Show resolved
Hide resolved
...tem.Diagnostics.TextWriterTraceListener/src/System/Diagnostics/DelimitedListTraceListener.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.DirectoryServices/src/System/DirectoryServices/ActiveDirectory/Utils.cs
Show resolved
Hide resolved
src/libraries/System.IO.Packaging/src/System/IO/Packaging/ContentType.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Net.HttpListener/src/System/Net/HttpListenerRequest.cs
Outdated
Show resolved
Hide resolved
| public static char ToLowerInvariant(char c) | ||
| { | ||
| return CultureInfo.InvariantCulture.TextInfo.ToLower(c); | ||
| return TextInfo.Invariant.ToLower(c); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd assumed I guess incorrectly that devirtualization was kicking in and enabling this to be direct calls... but I guess not?
src/libraries/System.Private.CoreLib/src/System/Environment.Unix.cs
Outdated
Show resolved
Hide resolved
|
@terrajobst Steve said you were looking at new analyzer rules we should enable. Many of the things I changed here would have been caught by https://docs.microsoft.com/en-us/visualstudio/code-quality/ca1307, but that rule is very noisy because it uses overly broad pattern-matching and has a high false positive rate. But there's possibly a need for a rule that triggers on just the APIs we know to be problematic, like the specific signatures Further reading: #30740 (comment) |
| internal static TextInfo Invariant => s_invariant ??= new TextInfo(CultureData.Invariant); | ||
|
|
||
| private static volatile TextInfo? s_invariant; | ||
| internal static readonly TextInfo Invariant = new TextInfo(CultureData.Invariant); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this have any negative impact on startup?
stephentoub
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
| if (type.FullName != null) | ||
| { | ||
| if (type.FullName.StartsWith("System.Collections.Generic.IEnumerable`1")) | ||
| if (type.FullName.StartsWith("System.Collections.Generic.IEnumerable`1", StringComparison.Ordinal)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @layomia
|
@dotnet/jit-contrib Any thoughts as to Steve's comment at #31968 (comment)? I can speak to impacting steady-state perf, but I don't really know how to reason about startup perf. |
|
A quick benchmark showing that [Benchmark]
[Arguments("hello")]
[Arguments("XYZ")]
public string ToUpperInvariant(string s) => s.ToUpperInvariant();; BEFORE - Benchmark.ToUpperInvariant(string)
0007ffd`0073a550 488bca mov rcx,rdx
00007ffd`0073a553 3909 cmp dword ptr [rcx],ecx
00007ffd`0073a555 e9de9ed7ff jmp CLRStub[MethodDescPrestub]@7ffd004b4438 (00007ffd`004b4438) ; string.ToUpperInvariant (non-virtual dispatch)
; BEFORE - string.ToUpperInvariant()
00007ffd`00739dd0 56 push rsi
00007ffd`00739dd1 4883ec20 sub rsp,20h
00007ffd`00739dd5 488bf1 mov rsi,rcx
00007ffd`00739dd8 48b98024ce1691020000 mov rcx,29116CE2480h
00007ffd`00739de2 488b09 mov rcx,qword ptr [rcx]
00007ffd`00739de5 48b8c8c56100fd7f0000 mov rax,7FFD0061C5C8h
00007ffd`00739def ff10 call qword ptr [rax] ; CultureInfo.get_TextInfo (non-virtual dispatch)
00007ffd`00739df1 488bc8 mov rcx,rax
00007ffd`00739df4 488bd6 mov rdx,rsi
00007ffd`00739df7 3909 cmp dword ptr [rcx],ecx
00007ffd`00739df9 4883c420 add rsp,20h
00007ffd`00739dfd 5e pop rsi
00007ffd`00739dfe e9fde6d8ff jmp CLRStub[MethodDescPrestub]@7ffd004c8500 (00007ffd`004c8500) ; TextInfo.ToUpper (non-virtual dispatch)
; BEFORE - CultureInfo.get_TextInfo()
00007ffc`fd169e20 57 push rdi
00007ffc`fd169e21 56 push rsi
00007ffc`fd169e22 4883ec28 sub rsp,28h
00007ffc`fd169e26 488bf1 mov rsi,rcx
00007ffc`fd169e29 48837e1000 cmp qword ptr [rsi+10h],0
00007ffc`fd169e2e 7531 jne System_Private_CoreLib!System.Globalization.CultureInfo.get_TextInfo()+0xffffffff`a0b22cf1 (00007ffc`fd169e61)
00007ffc`fd169e30 48b9183305fdfc7f0000 mov rcx,7FFCFD053318h
00007ffc`fd169e3a e85163945f call CoreCLR!coreclr_shutdown_2+0x10eb0 (00007ffd`5cab0190)
00007ffc`fd169e3f 488bf8 mov rdi,rax
00007ffc`fd169e42 488b5630 mov rdx,qword ptr [rsi+30h]
00007ffc`fd169e46 488bcf mov rcx,rdi
00007ffc`fd169e49 e8cae5d8ff call CLRStub[MethodDescPrestub]@7ffcfcef8418 (00007ffc`fcef8418)
00007ffc`fd169e4e 0fb65660 movzx edx,byte ptr [rsi+60h]
00007ffc`fd169e52 885730 mov byte ptr [rdi+30h],dl
00007ffc`fd169e55 488d4e10 lea rcx,[rsi+10h]
00007ffc`fd169e59 488bd7 mov rdx,rdi
00007ffc`fd169e5c e80f55945f call CoreCLR!coreclr_shutdown_2+0x10090 (00007ffd`5caaf370)
00007ffc`fd169e61 488b4610 mov rax,qword ptr [rsi+10h]
00007ffc`fd169e65 4883c428 add rsp,28h
00007ffc`fd169e69 5e pop rsi
00007ffc`fd169e6a 5f pop rdi
00007ffc`fd169e6b c3 ret
; AFTER - Benchmark.ToUpperInvariant(string)
00007ffd`0075a280 8b0a mov ecx,dword ptr [rdx]
00007ffd`0075a282 48b930267a1ff9010000 mov rcx,1F91F7A2630h ; string.ToUpperInvariant inlined into caller
00007ffd`0075a28c 488b09 mov rcx,qword ptr [rcx]
00007ffd`0075a28f 3909 cmp dword ptr [rcx],ecx
00007ffd`0075a291 e962e2d8ff jmp CLRStub[MethodDescPrestub]@7ffd004e84f8 (00007ffd`004e84f8) ; TextInfo.ToUpper (non-virtual dispatch)
|
|
CI failures seem to be known issues. |
|
To answer (at in part) the startup question, the intrinsics are recognized in the importer, and AFAICT there are none that are only expanding when optimizing - @dotnet/jit-contrib are you aware of any issues with recognizing intrinsics in Tier0? |
Actually it's the other way round -- we generally won't expand intrinsics at Tier0/minopts: // Under debug and minopts, only expand what is required.
if (!mustExpand && opts.OptimizationDisabled())
{
*pIntrinsicID = CORINFO_INTRINSIC_Illegal;
return retNode;
}Must expand is only true for the recursive HW intrinsic methods and for a few of the old-style intrinsics. |
|
Thanks @AndyAyersMS - do you think that the lack of expansion for intrinsics in minopts/tier0 should inform their use? FWIW, I believe that all of the hw intrinsics are expanded, except those that are not directly supported (i.e. only expanded when The SIMD intrinsics seem to be recognized independent of optimization level also. |
|
The HW intrinsics are expanded in minopts/Tier0 only when they're seen as "recursive" methods; otherwise they're not expanded. So the instructions are not inlined into calling methods like they are when optimizing. Other new-style intrinsics (like
Not quite sure what you are asking. Arguably we should expand intrinsics at Tier0; it would probably lead to smaller and faster code and better jit throughput (see #9120). But then minopts and Tier0 would diverge... |
|
I don't believe |
|
If the class constructor have run before jitting, we'll burn static readonly values into codegen for numeric types, and optimize based on the actual type of the readonly statics for ref types (devirtualization, type tests, ...). There is more we could do here for immutable or partially immutable types, eg static readonly array lengths, string lengths, string contents. A static readonly ref type's fields are mutable, so we can't use them for optimization; likewise for static readonly structs or static readonly array elements. |
|
@AndyAyersMS Sure. I think I'm using incorrect terminology. In the assembly below, it's burning into the codegen not a reference to the public string ToUpperInvariant(string s) => s.ToUpperInvariant();00007ffd`0075a280 8b0a mov ecx,dword ptr [rdx]
00007ffd`0075a282 48b930267a1ff9010000 mov rcx,1F91F7A2630h
00007ffd`0075a28c 488b09 mov rcx,qword ptr [rcx]
00007ffd`0075a28f 3909 cmp dword ptr [rcx],ecx
00007ffd`0075a291 e962e2d8ff jmp CLRStub[MethodDescPrestub]@7ffd004e84f8 (00007ffd`004e84f8)The concern AFAIK was whether introducing a static ctor on |
|
Addresses of statics are usually baked in to code -- for non-generic classes anyways. Perhaps we should also look at the impact on R2R codegen? Prejitted code always has to check for class initialization. I can run Tier0 and R2R diffs for you, but may not have results until tomorrow. |
I'm open to whatever tests you think might be warranted. Right now I don't know how to run these tests (or even what tests to run!), but if you care to point me to them then I can take that under consideration for future changes. |
|
Diff summary here. Diffs themselves are too large to share -- I can get you specific method diffs if you want. Overall impact on size is pretty minimal. Tier0/Tier1 diffs for SPC sometimes include a few spurious diffs; I think that may be the case here. |
High-level summary of changes:
Improve performance of
string.ToUpperInvariant,string.ToLowerInvariant,char.ToUpperInvariant,char.ToLowerInvariantby binding them directly to the correctTextInfoinstance rather than bouncing throughCultureInfo.Invariant.Avoid dereferencing
CultureInfo.InvariantCulturewhere possible at the call sites and use the invariant-specific code paths.Replace the pattern
if (string.Trim().Length != 0)to usestring.IsNullOrWhiteSpaceinstead.Replace call sites that incorrectly use culture-aware
string.ToUpper/string.ToLowerto useToUpperInvariant/ToLowerInvariantinstead.Replace call sites that incorrectly use culture-aware
string.IndexOf(...)to usestring.IndexOf(..., StringComparison.Ordinal)instead.Replace call sites that incorrectly use culture-aware
string.StartsWith(...)/string.EndsWith(...)to use ordinalstring.StartsWith(..., StringComparison.Ordinal)/string.EndsWith(..., StringComparison.Ordinal)instead.If a project specifically targets .NET Core latest, change
string.IndexOf("x", StringComparison.Ordinal)tostring.IndexOf('x').There's also lots of opportunity for performance improvements here, such as avoiding allocations of temporary objects. I wasn't really focusing on that as much as I was the low-hanging fruit of making sure the correct comparison enum value was passed into the call sites.
One thing that made this a bit difficult is that
stringandROS<char>behave differently as far as culture-awareness of certain methods. For example:So if you don't know whether the this parameter provided to these methods is a
stringor aReadOnlySpan<char>, you won't know whichStringComparisonis being used.