Skip to content

Conversation

@GrabYourPitchforks
Copy link
Member

@GrabYourPitchforks GrabYourPitchforks commented Feb 8, 2020

High-level summary of changes:

  • Improve performance of string.ToUpperInvariant, string.ToLowerInvariant, char.ToUpperInvariant, char.ToLowerInvariant by binding them directly to the correct TextInfo instance rather than bouncing through CultureInfo.Invariant.

  • Avoid dereferencing CultureInfo.InvariantCulture where possible at the call sites and use the invariant-specific code paths.

  • Replace the pattern if (string.Trim().Length != 0) to use string.IsNullOrWhiteSpace instead.

  • Replace call sites that incorrectly use culture-aware string.ToUpper / string.ToLower to use ToUpperInvariant / ToLowerInvariant instead.

  • Replace call sites that incorrectly use culture-aware string.IndexOf(...) to use string.IndexOf(..., StringComparison.Ordinal) instead.

  • Replace call sites that incorrectly use culture-aware string.StartsWith(...) / string.EndsWith(...) to use ordinal string.StartsWith(..., StringComparison.Ordinal) / string.EndsWith(..., StringComparison.Ordinal) instead.

  • If a project specifically targets .NET Core latest, change string.IndexOf("x", StringComparison.Ordinal) to string.IndexOf('x').

There's also lots of opportunity for performance improvements here, such as avoiding allocations of temporary objects. I wasn't really focusing on that as much as I was the low-hanging fruit of making sure the correct comparison enum value was passed into the call sites.

One thing that made this a bit difficult is that string and ROS<char> behave differently as far as culture-awareness of certain methods. For example:

string s = GetString();
int idxA = s.IndexOf("target"); // <-- CurrentCulture

ReadOnlySpan<char> span = s;
int idxB = span.IndexOf("target"); // <-- Ordinal

So if you don't know whether the this parameter provided to these methods is a string or a ReadOnlySpan<char>, you won't know which StringComparison is being used.

public static char ToLowerInvariant(char c)
{
return CultureInfo.InvariantCulture.TextInfo.ToLower(c);
return TextInfo.Invariant.ToLower(c);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd assumed I guess incorrectly that devirtualization was kicking in and enabling this to be direct calls... but I guess not?

@GrabYourPitchforks
Copy link
Member Author

GrabYourPitchforks commented Feb 8, 2020

@terrajobst Steve said you were looking at new analyzer rules we should enable. Many of the things I changed here would have been caught by https://docs.microsoft.com/en-us/visualstudio/code-quality/ca1307, but that rule is very noisy because it uses overly broad pattern-matching and has a high false positive rate. But there's possibly a need for a rule that triggers on just the APIs we know to be problematic, like the specific signatures string.IndexOf(string) and string.StartsWith(string).

Further reading: #30740 (comment)

internal static TextInfo Invariant => s_invariant ??= new TextInfo(CultureData.Invariant);

private static volatile TextInfo? s_invariant;
internal static readonly TextInfo Invariant = new TextInfo(CultureData.Invariant);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have any negative impact on startup?

Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@GrabYourPitchforks GrabYourPitchforks marked this pull request as ready for review February 11, 2020 00:22
if (type.FullName != null)
{
if (type.FullName.StartsWith("System.Collections.Generic.IEnumerable`1"))
if (type.FullName.StartsWith("System.Collections.Generic.IEnumerable`1", StringComparison.Ordinal))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GrabYourPitchforks
Copy link
Member Author

@dotnet/jit-contrib Any thoughts as to Steve's comment at #31968 (comment)? I can speak to impacting steady-state perf, but I don't really know how to reason about startup perf.

@GrabYourPitchforks
Copy link
Member Author

GrabYourPitchforks commented Feb 12, 2020

A quick benchmark showing that string.ToUpperInvariant is able to take advantage of the optimized codegen of calling TextInfo.Invariant directly:

[Benchmark]
[Arguments("hello")]
[Arguments("XYZ")]
public string ToUpperInvariant(string s) => s.ToUpperInvariant();
; BEFORE - Benchmark.ToUpperInvariant(string)
0007ffd`0073a550 488bca          mov     rcx,rdx
00007ffd`0073a553 3909            cmp     dword ptr [rcx],ecx
00007ffd`0073a555 e9de9ed7ff      jmp     CLRStub[MethodDescPrestub]@7ffd004b4438 (00007ffd`004b4438) ; string.ToUpperInvariant (non-virtual dispatch)
; BEFORE - string.ToUpperInvariant()
00007ffd`00739dd0 56              push    rsi
00007ffd`00739dd1 4883ec20        sub     rsp,20h
00007ffd`00739dd5 488bf1          mov     rsi,rcx
00007ffd`00739dd8 48b98024ce1691020000 mov rcx,29116CE2480h
00007ffd`00739de2 488b09          mov     rcx,qword ptr [rcx]
00007ffd`00739de5 48b8c8c56100fd7f0000 mov rax,7FFD0061C5C8h
00007ffd`00739def ff10            call    qword ptr [rax] ; CultureInfo.get_TextInfo (non-virtual dispatch)
00007ffd`00739df1 488bc8          mov     rcx,rax
00007ffd`00739df4 488bd6          mov     rdx,rsi
00007ffd`00739df7 3909            cmp     dword ptr [rcx],ecx
00007ffd`00739df9 4883c420        add     rsp,20h
00007ffd`00739dfd 5e              pop     rsi
00007ffd`00739dfe e9fde6d8ff      jmp     CLRStub[MethodDescPrestub]@7ffd004c8500 (00007ffd`004c8500) ; TextInfo.ToUpper (non-virtual dispatch)
; BEFORE - CultureInfo.get_TextInfo()
00007ffc`fd169e20 57              push    rdi
00007ffc`fd169e21 56              push    rsi
00007ffc`fd169e22 4883ec28        sub     rsp,28h
00007ffc`fd169e26 488bf1          mov     rsi,rcx
00007ffc`fd169e29 48837e1000      cmp     qword ptr [rsi+10h],0
00007ffc`fd169e2e 7531            jne     System_Private_CoreLib!System.Globalization.CultureInfo.get_TextInfo()+0xffffffff`a0b22cf1 (00007ffc`fd169e61)
00007ffc`fd169e30 48b9183305fdfc7f0000 mov rcx,7FFCFD053318h
00007ffc`fd169e3a e85163945f      call    CoreCLR!coreclr_shutdown_2+0x10eb0 (00007ffd`5cab0190)
00007ffc`fd169e3f 488bf8          mov     rdi,rax
00007ffc`fd169e42 488b5630        mov     rdx,qword ptr [rsi+30h]
00007ffc`fd169e46 488bcf          mov     rcx,rdi
00007ffc`fd169e49 e8cae5d8ff      call    CLRStub[MethodDescPrestub]@7ffcfcef8418 (00007ffc`fcef8418)
00007ffc`fd169e4e 0fb65660        movzx   edx,byte ptr [rsi+60h]
00007ffc`fd169e52 885730          mov     byte ptr [rdi+30h],dl
00007ffc`fd169e55 488d4e10        lea     rcx,[rsi+10h]
00007ffc`fd169e59 488bd7          mov     rdx,rdi
00007ffc`fd169e5c e80f55945f      call    CoreCLR!coreclr_shutdown_2+0x10090 (00007ffd`5caaf370)
00007ffc`fd169e61 488b4610        mov     rax,qword ptr [rsi+10h]
00007ffc`fd169e65 4883c428        add     rsp,28h
00007ffc`fd169e69 5e              pop     rsi
00007ffc`fd169e6a 5f              pop     rdi
00007ffc`fd169e6b c3              ret

; AFTER - Benchmark.ToUpperInvariant(string)
00007ffd`0075a280 8b0a            mov     ecx,dword ptr [rdx]
00007ffd`0075a282 48b930267a1ff9010000 mov rcx,1F91F7A2630h ; string.ToUpperInvariant inlined into caller
00007ffd`0075a28c 488b09          mov     rcx,qword ptr [rcx]
00007ffd`0075a28f 3909            cmp     dword ptr [rcx],ecx
00007ffd`0075a291 e962e2d8ff      jmp     CLRStub[MethodDescPrestub]@7ffd004e84f8 (00007ffd`004e84f8) ; TextInfo.ToUpper (non-virtual dispatch)
Method Toolchain s Mean Error StdDev Ratio RatioSD
ToUpperInvariant master XYZ 7.081 ns 0.1886 ns 0.1764 ns 1.00 0.00
ToUpperInvariant toupper_tolower XYZ 4.731 ns 0.1721 ns 0.1691 ns 0.67 0.04
ToUpperInvariant master hello 23.864 ns 0.3703 ns 0.3092 ns 1.00 0.00
ToUpperInvariant toupper_tolower hello 20.406 ns 0.3830 ns 0.3583 ns 0.85 0.02

@GrabYourPitchforks
Copy link
Member Author

@jozkee @layomia Can you take a look at the commit f3e0b7c, which is part of this PR? I want to confirm I'm not changing the intended meaning of the code.

@GrabYourPitchforks
Copy link
Member Author

CI failures seem to be known issues.

@CarolEidt
Copy link
Contributor

To answer (at in part) the startup question, the intrinsics are recognized in the importer, and AFAICT there are none that are only expanding when optimizing - @dotnet/jit-contrib are you aware of any issues with recognizing intrinsics in Tier0?

@AndyAyersMS
Copy link
Member

there are none that are only expanding when optimizing

Actually it's the other way round -- we generally won't expand intrinsics at Tier0/minopts:

    // Under debug and minopts, only expand what is required.
    if (!mustExpand && opts.OptimizationDisabled())
    {
        *pIntrinsicID = CORINFO_INTRINSIC_Illegal;
        return retNode;
    }

Must expand is only true for the recursive HW intrinsic methods and for a few of the old-style intrinsics.

@CarolEidt
Copy link
Contributor

Thanks @AndyAyersMS - do you think that the lack of expansion for intrinsics in minopts/tier0 should inform their use?

FWIW, I believe that all of the hw intrinsics are expanded, except those that are not directly supported (i.e. only expanded when mustExpand is true), as this code is executed before the part you cite above:

            if ((ni > NI_HW_INTRINSIC_START) && (ni < NI_HW_INTRINSIC_END))
            {
                GenTree* hwintrinsic = impHWIntrinsic(ni, clsHnd, method, sig, mustExpand);

                if (mustExpand && (hwintrinsic == nullptr))
                {
                    return impUnsupportedHWIntrinsic(CORINFO_HELP_THROW_NOT_IMPLEMENTED, method, sig, mustExpand);
                }

                return hwintrinsic;

The SIMD intrinsics seem to be recognized independent of optimization level also.

@AndyAyersMS
Copy link
Member

The HW intrinsics are expanded in minopts/Tier0 only when they're seen as "recursive" methods; otherwise they're not expanded. So the instructions are not inlined into calling methods like they are when optimizing.

Other new-style intrinsics (like Type.IsValueType which I think is what prompted the questions above) don't behave this way and don't get recognized as intrinsics at minops/Tier0

do you think that the lack of expansion for intrinsics in minopts/tier0 should inform their use?

Not quite sure what you are asking.

Arguably we should expand intrinsics at Tier0; it would probably lead to smaller and faster code and better jit throughput (see #9120). But then minopts and Tier0 would diverge...

@GrabYourPitchforks
Copy link
Member Author

I don't believe TextInfo.Invariant is recognized as an intrinsic per se. Wouldn't this be going through the normal "evaluate static ctors and burn in the values of static readonly fields" mechanism, whatever that mechanism might be?

@AndyAyersMS
Copy link
Member

If the class constructor have run before jitting, we'll burn static readonly values into codegen for numeric types, and optimize based on the actual type of the readonly statics for ref types (devirtualization, type tests, ...).

There is more we could do here for immutable or partially immutable types, eg static readonly array lengths, string lengths, string contents.

A static readonly ref type's fields are mutable, so we can't use them for optimization; likewise for static readonly structs or static readonly array elements.

@GrabYourPitchforks
Copy link
Member Author

@AndyAyersMS Sure. I think I'm using incorrect terminology. In the assembly below, it's burning into the codegen not a reference to the TextInfo.Invariant object, but to the TextInfo.Invariant field (line 2 below), which still needs to be dereferenced (line 3 below) to get at the real object.

public string ToUpperInvariant(string s) => s.ToUpperInvariant();
00007ffd`0075a280 8b0a            mov     ecx,dword ptr [rdx]
00007ffd`0075a282 48b930267a1ff9010000 mov rcx,1F91F7A2630h
00007ffd`0075a28c 488b09          mov     rcx,qword ptr [rcx]
00007ffd`0075a28f 3909            cmp     dword ptr [rcx],ecx
00007ffd`0075a291 e962e2d8ff      jmp     CLRStub[MethodDescPrestub]@7ffd004e84f8 (00007ffd`004e84f8)

The concern AFAIK was whether introducing a static ctor on TextInfo would harm Tier 0 compilation. I don't quite know how to answer that question. The steady-state codegen looks quite optimal though, so I'm pleased with that at least. :)

@AndyAyersMS
Copy link
Member

Addresses of statics are usually baked in to code -- for non-generic classes anyways.

Perhaps we should also look at the impact on R2R codegen? Prejitted code always has to check for class initialization.

I can run Tier0 and R2R diffs for you, but may not have results until tomorrow.

@GrabYourPitchforks
Copy link
Member Author

Perhaps we should also look at the impact on R2R codegen?

I'm open to whatever tests you think might be warranted. Right now I don't know how to run these tests (or even what tests to run!), but if you care to point me to them then I can take that under consideration for future changes.

@AndyAyersMS
Copy link
Member

Diff summary here. Diffs themselves are too large to share -- I can get you specific method diffs if you want.

Overall impact on size is pretty minimal.

Tier0/Tier1 diffs for SPC sometimes include a few spurious diffs; I think that may be the case here.

@GrabYourPitchforks GrabYourPitchforks merged commit 036fb07 into dotnet:master Feb 13, 2020
@GrabYourPitchforks GrabYourPitchforks deleted the toupper_tolower branch February 13, 2020 17:40
@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants