Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

String.Join optimization for single item lists#1460

Merged
AlexGhiondea merged 1 commit into
dotnet:masterfrom
bbowyersmyth:StringJoin
Sep 17, 2015
Merged

String.Join optimization for single item lists#1460
AlexGhiondea merged 1 commit into
dotnet:masterfrom
bbowyersmyth:StringJoin

Conversation

@bbowyersmyth
Copy link
Copy Markdown

Returns existing string instance for String.Join when there is only one item in the list. Optimizations for when IEnumerable<string> is a IList<string>.
I've tried to match the coding style in this file but it is a bit variable. CoreFX System.Runtime tests have been run on these changes.

It is common to have lists that may have multiple items but typically have only one. Running a String.Join on them will needlessly create a new string.
One example is http header parsing.

https://github.com/aspnet/HttpAbstractions/blob/dev/src/Microsoft.AspNet.Http.Extensions/ParsingHelpers.cs#L537

Some headers support multi values but will usually only have one. In the AspNet implementation all headers are in a IEnumerable to simplify the interfaces.

This PR is mainly about reducing allocations but there are some immediate performance benefits to go along with it. Perf tests done with 1M iterations.

String[] Join

  • Early exit when there is only one item. Avoids new string allocation. This is basically the reverse of 11cddd0#diff-d2a641ec9b77a5ea1cc985bdec3e74ef
  • Store value[stringToJoinIndex] in a local variable to avoid double array lookup.
Test value Before After Faster
string[1] 46 13 254%
string[5] 232 225 3%
string[100] 4683 4564 3%

IEnumerable<String> Join

  • Removed null separator normalisation to String.Empty. StringBuilder.Append handles nulls and is the first check in that function. So passing null is actually more desirable.

IEnumerable<String> Join when values is an IList<String>

  • Use IList interface for Count and index property access. Same optimisation as many of the Linq methods. Early exit for when Count is 0 and 1, avoids new string allocation.
  • Use for loop for iteration to avoid enumerator creation
Test value Before After Faster
List(1) 89 14 538%
List(5) 212 145 46%
List(100) 3570 2565 39%

IEnumerable<String> Join when values is not an IList<String> (used Queue<String> for testing)

  • Removed null checks on en.Current. As above, Append handles nulls and this removes the double property access and branch for what should be a rarer case.
Test value Before After Faster
Queue(1) 95 45 113%
Queue(5) 240 182 31%
Queue(100) 4021 3236 24%

Edit: IList<string> path is no longer included in this PR

Comment thread src/mscorlib/src/System/String.cs Outdated
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❗ Did this just drop the second value? Seems like you either need a do/while loop, or add the following before this loop.

result.Append(separator);
result.Append(en.Current);

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sharwell , incomplete copy from my test harness. I'll add that later as well as a gist which might make it easier to confirm perf.

@Eyas
Copy link
Copy Markdown

Eyas commented Aug 30, 2015

Since you are only depending on the read-only properties of IList, is it better to make this IReadOnlyList-specific instead of IList-specific?

This means we will see these improvements for List<T> (which implements IReadOnlyList<T>), as well as ImmutableList<T> and ImmutableArray<T>, etc.

This means there would be no slow-down for some custom mutable list-like implementations that only inherit from IList. Still, might be worthwhile.

@sharwell
Copy link
Copy Markdown

@Eyas ImmutableList<T> and ImmutableArray<T> both implement IList<T>.

@Eyas
Copy link
Copy Markdown

Eyas commented Aug 30, 2015

Ah yes, bad example. But my suggestion stands. Why not use the minimal interface that provides only the functions needed to get the perf boost?

The alternative is to force an implementer to always implement IList to be treated as a first class citizen, even of the mutating methods are throw-only

@bbowyersmyth
Copy link
Copy Markdown
Author

Interesting suggestion @Eyas I used IList mainly for consistency with other areas such as Linq and I'm not sure how popular IReadOnlyList has become yet.

Question for the maintainers, any preferences on this going forward?

@ayende
Copy link
Copy Markdown

ayende commented Aug 31, 2015

Aren't there implications about interface calls vs. direct call here, WRT
performance?

*Hibernating Rhinos Ltd *

Oren Eini* l CEO l *Mobile: + 972-52-548-6969

Office: +972-4-622-7811 *l *Fax: +972-153-4-622-7811

On Mon, Aug 31, 2015 at 5:53 AM, Bruce Bowyer-Smyth <
notifications@github.com> wrote:

Interesting suggestion @Eyas https://github.com/Eyas I used IList
mainly for consistency with other areas such as Linq and I'm not sure how
popular IReadOnlyList has become yet.

Question for the maintainers, any preferences on this going forward?


Reply to this email directly or view it on GitHub
#1460 (comment).

@JonHanna
Copy link
Copy Markdown

There's a performance difference in the test of as ISomeInterface<T> when ISomeInterface is variant and when it isn't, presumably due to the fact that a given object could be implementing such an interface or also the same interface but for a type that works with the variance.
IReadOnlyList<T> is covariant, so it will suffer here. Whether it suffers enough to outweigh the possible benefits would need measurements.

@bbowyersmyth
Copy link
Copy Markdown
Author

@ayende forcing List into either path the timings are:

Test Value IList IEnumerable
List(1) 14 37
List(5) 145 195
List(100) 2565 3327

If list.Count was cached into a local variable to avoid multiple interface calls to that:

Test Value IList
List(1) 14
List(5) 127
List(100) 2267

So that might be worth doing which would leave one Count and multiple indexer accesses.
Was there a test you had in mind?

@ayende
Copy link
Copy Markdown

ayende commented Aug 31, 2015

No, I meant List vs. IList vs. IReadOnlyList

*Hibernating Rhinos Ltd *

Oren Eini* l CEO l *Mobile: + 972-52-548-6969

Office: +972-4-622-7811 *l *Fax: +972-153-4-622-7811

On Mon, Aug 31, 2015 at 3:20 PM, Bruce Bowyer-Smyth <
notifications@github.com> wrote:

@ayende https://github.com/ayende forcing List into either path the
timings are:
Test Value IList IEnumerable List(1) 14 14 List(5) 145 195 List(100) 2565
3327

If list.Count was cached into a local variable to avoid multiple
interface calls to that:
Test Value IList List(1) 14 List(5) 127 List(100) 2267

So that might be worth doing which would leave one Count and multiple
indexer accesses.
Was there a test you had in mind?


Reply to this email directly or view it on GitHub
#1460 (comment).

@JonHanna
Copy link
Copy Markdown

I'm curious as to what results you get if you have only the improvement for the single-element enumerable case, and remove the IList<string> path entirely. In particular the effect when the IList<string> in question is string[].

@bbowyersmyth
Copy link
Copy Markdown
Author

With the list.Count caching change.

Test Value List<T> IReadOnlyList<T> No IList path at all
List(1) 7 131 37
List(5) 112 248 196
List(100) 1987 2342 3274

There is a specific Join overload for string[] but if the value arrived as an IList<T>

Test Value Before (existing) IList path No IList path at all
(IList<T>) string[1] 83 43 28
(IList<T>) string[5] 182 165 164
(IList<T>) string[100] 3336 2455 2753

@bbowyersmyth
Copy link
Copy Markdown
Author

Committed the count local variable cache. The IList<string> try cast still seems like the best general improvement so we go with that or play it safe and just stick with the IEnumerable path.

Gist for perf testing:
https://gist.github.com/bbowyersmyth/c1c55444fb102281e3c9

@bbowyersmyth
Copy link
Copy Markdown
Author

I'm thinking that the code should just stick with IEnumerable path for now (which is still a win). There is no guarantee that IList indexers have to return values in the same order as their enumerator. While it is probably fine for Linq to work on that assumption, System.String should be more correct.
Unless there are any other thoughts I'll resubmit this commit later this afternoon.

@bbowyersmyth
Copy link
Copy Markdown
Author

Replaced the commit to remove the IList try cast.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this was removed because the StringBuilder.Append() correctly handle the null case. Is that correct?

Did this particular change impact the results in a significant way?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. Mentioned some of the reasoning in the PR message.
There is not a lot in it, for a null separator on List<string>(100) it is 3227ms with it and 3186ms without it. It's just not protecting from anything as Append does the check again anyway.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. As long as we document somewhere that this was an intentional change to remove the null check 😄

@bbowyersmyth
Copy link
Copy Markdown
Author

Addressed the feedback with variable name, nesting, and null handling comment

@AlexGhiondea
Copy link
Copy Markdown

👍

Thank you @bbowyersmyth for the change! If you don't mind, could you squash the commit that adressed the PR feedback?

I prefer having separate commits while the PR is open as that makes it easier to see what has changes after each feedback iteration but I don't think they add value once we are done with the feedback :).

@bbowyersmyth
Copy link
Copy Markdown
Author

@AlexGhiondea for sure. Done.

@AlexGhiondea
Copy link
Copy Markdown

Thanks @bbowyersmyth !

Waiting for a second 👍 in order to merge it in!

@stephentoub
Copy link
Copy Markdown
Member

LGTM. Do we have tests in corefx for these special cases?

@bbowyersmyth
Copy link
Copy Markdown
Author

@stephentoub Yes they were added via dotnet/corefx#2839

AlexGhiondea added a commit that referenced this pull request Sep 17, 2015
String.Join optimization for single item lists
@AlexGhiondea AlexGhiondea merged commit d176041 into dotnet:master Sep 17, 2015
@bbowyersmyth bbowyersmyth deleted the StringJoin branch September 18, 2015 06:22
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
String.Join optimization for single item lists

Commit migrated from dotnet/coreclr@d176041
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants