Add String.Split overloads that take a single char and string separator#895
Conversation
There was a problem hiding this comment.
This should be SecurityCritical, the safety of this methods depends on up stack code being correct (i.e. if you pass in a bad seperatorsLength you violate memory safety).
|
Thanks, @justinvp. This looks like a great first start. I'm going to want to take another in-depth look (as well as review the tests) before signing off. I'm especially interested in understanding if doing something with unsafe code would end up being better than the StringSeperatorArray thing. Did you happen to play around with that? We also need to figure out the process for taking this type of change (e.g. do we have a future branch?) so it will probably be a few days before I'm comfortable merging this. I'm already working internally to figure out what the right strategy is. |
d88b61d to
862da1e
Compare
There was a problem hiding this comment.
Can this field be eliminated if you place constraints on the other two fields? (e.g. _separator != null implies 'single')
There was a problem hiding this comment.
The problem is _separator can be null, that's a valid value.
There was a problem hiding this comment.
When you construct this you could convert a null separator to string.Empty, and I think that would give you the behavior you want?
There was a problem hiding this comment.
Ah, @ellismg, I see what you mean. Since separators that are null or empty are treated the same by Split, string.Empty can be set so that _separator != null implies 'single'. Thanks for the feedback guys!
|
Thanks for the initial feedback, @ellismg and @pharring!
I went with the I haven't had a chance to run before/after comparisons to see what the perf difference is, but I'll do that and let you know. And I'll look into the unsafe approach. |
862da1e to
8adea6c
Compare
8adea6c to
1827eb2
Compare
|
Hey, what happened to this? |
|
Sorry, I was trying to replace the last commit with another commit from another branch and GitHub ended up closing the PR automatically. I'll try to reopen, if not, I'll open a new PR. |
|
The unsafe approach for As an alternative, I got rid of the As an aside, it'd be nice to investigate whether it's possible to get rid of or minimize/mitigate the |
1130946 to
32554a8
Compare
Sounds good, thanks for giving it a shot.
This is something I want to explore as well. It's probably reasonable to bank the wins from this new overload and do the investigation as a separate PR. I think the best place to start is writing some benchmarks where we could measure both time and GC. Right now, the state of the art for doing this seems to be:
We are working on getting a system set up so you just write benchmarks in XUnit and everything else is taken care of (and run in the CI system so we ensure we don't regress) but we probably won't be there for a few months. If you want to take the lead here, I'm happy to let you do so, and we can bounce ideas off one another (perhaps in another issue or PR?) |
There was a problem hiding this comment.
Could be
internal unsafe String[] SplitInternal(char[] separator, int count, StringSplitOptions options)
?
There was a problem hiding this comment.
Could be. I don't have a preference either way.
There was a problem hiding this comment.
less brackets - less indentation - better readability
There was a problem hiding this comment.
The flip side is this:
If unsafe is an implementation detail then why mark the methods unsafe?
A moot point, but I digress.
There was a problem hiding this comment.
Is this better as Contract.Assert(!string.IsNullOrEmpty(separator), $"!string.IsNullOrEmpty({nameof(separator)})"); or is string interpolation not available?
Also, you seem to bounce between String and string, which is "correct"?
There was a problem hiding this comment.
String interpolation can't be used as mscorlib still needs to be compiled with the older C# compiler.
As for String vs. string, the code style in this file is inconsistent all over the place, but String would probably be the most consistent with other uses in this file (this is in contrast with the corefx repo where string is preferred).
The community is very interested in addressing the performance aspect of .NET Core. |
|
+1 on merging this in. We maintain a |
|
The main thing I've been waiting on is whether this needs to be resubmitted against (a yet to be created) "future" branch. Though, perhaps this PR can be merged into coreclr/master, with the actual reference assembly contract changes submitted separately against corefx/future? @ellismg, What do you think? (Also, I won't be able to get to this until next week when I'm back in front of a computer, but I want to ensure the additions to model.xml here are in the "right" order, to be consistent with the rest of the file.) |
32554a8 to
b454cee
Compare
|
@justinvp, can a similar overload with single char added for string.Join too? The current approach is to call ToString() on char, which is bit inefficient, ex. |
|
Just curious, is this still going to be merged? It's been a few months since the last commit... |
|
@jasonwilliams200OK We should keep this PR focused on the APIs covered with https://github.com/dotnet/corefx/issues/1513. Another one should be filed for additional public APIs. @justinvp I think this change can go into coreclr\master but as you point out we cannot expose them in System.Runtime contract yet until these changes can actually be brought into the full .NET Framework as well so we would need to either create a future version of System.Runtime contract ref or put it into the corefx/future branch (I need to figure out how we add such APIs on core types in general as we are still working out the whole netstandard stuff). @ellismg @AlexGhiondea are one you going to follow-up about getting these added to the .NET Framework? |
|
@weshaggard, thanks. The String.Split issue was filed here: https://github.com/dotnet/corefx/issues/5552 followed by the PR #2945. :) |
b454cee to
7925348
Compare
|
Is there any update here? I see the last build breaking. I'm assuming this falls under the next generation, so are we just waiting until post-RC2 or RTM to consider PRs like this? |
|
I think this needs to go through API Review. @terrajobst can you help schedule this? |
|
This has already been API reviewed and approved. @terrajobst said:
https://github.com/dotnet/corefx/issues/1513#issuecomment-97279502 |
|
The plan is to do this, but we are going to wait to post 1.0 |
|
@ellismg, @jkotas, is this an appropriate time to get this reviewed/merged? https://github.com/dotnet/corefx/issues/2578 (another new System.Runtime API) was just merged. This is https://github.com/dotnet/corefx/issues/1513, which was approved by API review. After this is reviewed and looks good, I can port to CoreRT. |
|
(and please this very similar #2945 as well 😊) |
|
Let's see the one I have merged to go through end-to-end to see how it is going to work. |
|
Any progress on this? I am thinking of making changes to |
|
Some ideas:
|
|
@justinvp I think you should just go ahead with this PR (1). I haven't actually made any changes to Split yet, and I wouldn't want to keep this from being merged any longer. I can send in my pull request after that's done. Great job on adding these, btw :) |
|
@justinvp thanks for all your work and patience on this. I chatted with @jkotas and we think going with your option (1) is the right approach to make some progress. So for APIs that are approved and reviewed we will allow them to be added to the implementation but we need to file a tracking issue in corefx to expose them in the reference assemblies and to provide tests for them at a later time. |
|
Thanks, @weshaggard & @jkotas!
In this particular case, I've already written the tests so that they automatically "light-up" when the APIs are exposed in the reference assembly, via extension methods, so the only test-related action will be to delete those extension methods once the APIs are exposed. |
|
LGTM. Thanks @justinvp. |
|
Awesome! 😄 |
Fixes dotnet/corefx/issues/1513
Notes:
Splittests manually in a one-off app compiled against a CoreCLR mscorlib.dll with the added methods (on Windows)./cc @ellismg