Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Slightly increase throughput of string.Concat(object[])#6547

Merged
jkotas merged 6 commits into
dotnet:masterfrom
jamesqo:concat-throughput
Aug 2, 2016
Merged

Slightly increase throughput of string.Concat(object[])#6547
jkotas merged 6 commits into
dotnet:masterfrom
jamesqo:concat-throughput

Conversation

@jamesqo
Copy link
Copy Markdown

@jamesqo jamesqo commented Aug 1, 2016

Previously, the method was using a regular string array to store the intermediary ToStrings of each of the objects. This had the effect that each write to the array was causing a covariant type-check to ensure the string can actually be written, even though it always can be since string is sealed (see #6537 for more on this). I've changed the intermediary array to be a RefAsValueType<string> instead (copied from System.Collections.Immutable which employs the same trick), which circumvents these checks.

Other changes/notes

  • I used totalLengthLong and checked if it was greater than int.MaxValue at the end, analagous to what was done in Avoid defensive copy in String.Concat(string[]) #4559, instead of checking for overflow at each iteration of the loop.
  • Additional optimization where we short-circuit and return string.Empty if all of the ToStrings are null/empty.
  • I also took the opportunity to clean up the code formatting and Hungarian notation, since the method in it's current state is kinda horrendous.
  • I noticed there was a new Internal/ directory in mscorlib due to Add EnvironmentAugments to coreclr #6205, so I put the RefAsValueType utility type in there instead of the System/ namespace. Since it's such a huge assembly, I think it's best to separate the internal types from the publicly exposed types.

Performance impact

Although this change was kind of hard to benchmark (presumably to the GC allocations and all the other stuff going on in the method), in general there seems to be a speedup of about 5-6%. FillStringChecked seems to be a larger bottleneck in this method, so it might be worth looking into applying AggressiveInlining there later.

(note: I initially posted that the speedup was ~50% in #6537, but that was because I was using an earlier, slightly different implementation that avoided calling FillStringChecked for nulls, and testing that with lots of nulls. That wasn't accurate.)

cc @jkotas @bbowyersmyth @mikedn

@jamesqo
Copy link
Copy Markdown
Author

jamesqo commented Aug 1, 2016

Oh and here is the code I was using to benchmark... I both collected its console output, and ran it through PerfView.

Here is an example run (though it's somewhat shaky):

Baseline
00:00:24.2704372
00:00:23.0148639
00:00:22.4286926
00:00:24.0373035
00:00:23.0377945

Experimental
00:00:23.3305212
00:00:21.4681526
00:00:20.8869711
00:00:23.4251711
00:00:20.0781215

On some runs all of the experimental values are lower (if only slightly) than the baseline ones.

@mikedn
Copy link
Copy Markdown

mikedn commented Aug 1, 2016

I used totalLengthLong and checked if it was greater than int.MaxValue at the end, analagous to what was done in 4559, instead of checking for overflow at each iteration of the loop.

How many arguments do you need to pass to Concat for this particular change to generate a measurable improvement? Same question for 32 bit architectures where long is slower.

Additional optimization where we short-circuit and return string.Empty if all of the ToStrings are null/empty.

Probably that's a good change but the chances that someone uses this particular string.Concat overload and that all arguments are null/empty seem pretty low.

@jkotas
Copy link
Copy Markdown
Member

jkotas commented Aug 1, 2016

I do not think that the slight improvement is worth the extra code. This should be taken care of by optimizing this in the JIT (#6537).

@jamesqo
Copy link
Copy Markdown
Author

jamesqo commented Aug 1, 2016

@mikedn I am aware long is slower on 32-bit archs since it takes 2 registers to implement, and branch prediction will probably minimize the cost of the check anyways even if it's made within the loop. I made the change mainly for purposes of readability: if (.. > int.MaxValue) is self-explanatory and doesn't need a "check for overflow" comment. I also did it for symmetry with the other PR #4559. If the array is so big where that makes a noticeable difference on 32-bit, then we have a bigger problem. ;)

Probably that's a good change but the chances that someone uses this particular string.Concat overload and that all arguments are null/empty seem pretty low.

Nullable returns string.Empty if HasValue == false: https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/Nullable.cs#L74

I agree it's pretty unlikely that all of the ToStrings will be null/empty as well, but in the contrived case where it actually does happen the cost is so low (one test _, _ and jne/je) in avoiding an object allocation, FillStringChecked / Buffer.Memmove call. I think it's worth it.

@jamesqo
Copy link
Copy Markdown
Author

jamesqo commented Aug 1, 2016

@jkotas What are the chances that could happen soon (optimizing in the JIT)? Also it may be useful to add the RefAsValueType in other scenarios where the type isn't sealed and the JIT can't verify it isn't exactly of type T[].

@jkotas
Copy link
Copy Markdown
Member

jkotas commented Aug 1, 2016

What are the chances that could happen soon?

It depends on if somebody volunteers to implement it.

it may be useful to add the RefAsValueType in other scenarios

I am pretty sure that you can easily find hundreds of places in coreclr or corefx where you can manually apply this optimization. That's why it is better done in the JIT .... it does not make sense to add harder to understand and maintain code everywhere for slight performance gains just because of JIT does not do an optimization today.

@jamesqo
Copy link
Copy Markdown
Author

jamesqo commented Aug 1, 2016

@jkotas

I am pretty sure that you can easily find hundreds of places in coreclr or corefx where you can manually apply this optimization.

I'm only proposing to do this for codepaths that are called frequently; of course I wouldn't want to litter lots of places where perf doesn't really matter with RefAsValueType.

That's why it is better done in the JIT

As @mikedn mentioned in the other thread it's going to be very hard / next to impossible to do this for non-sealed types where we can't see the allocation of the array. For example, we almost certainly won't be able to do this for ArrayPool<object>.Shared.Rent(length) since Rent isn't going to be inlined (even if it was a sealed method).

At any rate, I guess I'll remove it from this PR for now. If any scenario such as the one I just mentioned comes up later, then I may re-add it. I'll limit this PR to the minor changes @mikedn commented on + the formatting changes.

edit: OK, I've reduced the changes in String to just using a long for counting chars during the loop / short-circuiting for empty strings. I've also kept the InternalSources for the EnvironmentAugments class, since in the future I'll probably introduce another utility class into Internal/ which does that anyways.

@mikedn
Copy link
Copy Markdown

mikedn commented Aug 1, 2016

I also did it for symmetry with the other PR #4559.

I don't think symmetry with other code is a good argument. It's different code, written and reviewed at a different time by different people.

For example, the loop in 4559 is much simpler and has a high chance to be evaluated using only registers on x86. But the loop you have here is more complex and there's a higher chance for a variable to be spilled to memory. Then you're not only adding a single adc instruction (which would be fine as a replacement of a compare and a conditional jump), you're adding a memory access too.

If the array is so big where that makes a noticeable difference on 32-bit, then we have a bigger problem. ;)

I don't understand this. The cost is the same no matter how big the array is.

As @mikedn mentioned in the other thread it's going to be very hard / next to impossible to do this for non-sealed types where we can't see the allocation of the array.

I didn't say it's very hard/next to impossible. It's problematic because the simplest solution isn't currently an option as it requires verification to be enabled. There are other solutions that could optimize this particular code, they're slightly more complicated but not hard/impossible.

Also it may be useful to add the RefAsValueType in other scenarios where the type isn't sealed and the JIT can't verify it isn't exactly of type T[].

It would be useful to have this in List<T> but it's difficult to use this trick there without causing other problems.

@jkotas
Copy link
Copy Markdown
Member

jkotas commented Aug 1, 2016

In addition to likely being slightly slower on 32-bit platforms, the change to use long that is checked after the loop is also subtle observable behavior change.

The part of the change that removes the redundant array access is a clear improvement because of it makes the code both smaller and faster.

Comment thread src/mscorlib/src/System/String.cs Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not be worth the comment. It is a well-known fact that you get best performance in for-loops over arrays when the limit is length of the array that is being iterated on, and not some other derived value.

@jamesqo
Copy link
Copy Markdown
Author

jamesqo commented Aug 2, 2016

@mikedn

I didn't say it's very hard/next to impossible. It's problematic because the simplest solution isn't currently an option as it requires verification to be enabled.

I was referring specifically to the case of ArrayPool<object>.Shared.Rent. In order to eliminate checks for that, we'd at a minimum have to devirtualize Shared.Rent and then look into that method and make sure all returns were strictly of type object[] in spite of it not being inlined. That would probably be challenging.

It would be useful to have this in List but it's difficult to use this trick there without causing other problems.

Agreed, e.g. CopyTo would be slower since Array.Copy would no longer be an option. I think ImmutableArray<T>.Builder once did this (it was even mentioned in this blog post) but that got shelved once MoveToImmutable was added.

@jkotas Oh wow, good catch, that was a pretty subtle compat issue (although it's likely never going to happen). I've updated the pull request to the old behavior of checking each iteration.

@jkotas
Copy link
Copy Markdown
Member

jkotas commented Aug 2, 2016

LGTM. Thanks!

@jkotas
Copy link
Copy Markdown
Member

jkotas commented Aug 2, 2016

@dotnet-bot test Windows_NT x64 Release Priority 1 Build and Test please

@jkotas jkotas merged commit 6773ee1 into dotnet:master Aug 2, 2016
@jamesqo jamesqo deleted the concat-throughput branch August 2, 2016 14:46
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants