Skip to content

Align ROS.Split handling of zero-length target strings with String.Split#37742

Closed
layomia wants to merge 2 commits intodotnet:masterfrom
layomia:ros_split
Closed

Align ROS.Split handling of zero-length target strings with String.Split#37742
layomia wants to merge 2 commits intodotnet:masterfrom
layomia:ros_split

Conversation

@layomia
Copy link
Contributor

@layomia layomia commented Jun 11, 2020

Follow up of #295.

First commit is clean-up. Review can start from second commit.

cc @bbartels

@layomia layomia added this to the 5.0 milestone Jun 11, 2020
@layomia layomia self-assigned this Jun 11, 2020
public static void DefaultSpanSplitEnumeratorBehavior()
{
var charSpanEnumerator = new SpanSplitEnumerator<string>();
var charSpanEnumerator = new SpanSplitEnumerator<char>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var charSpanEnumerator = new SpanSplitEnumerator<char>();
ReadOnlySpan<char> charSpanEnumerator = new SpanSplitEnumerator<char>();

To align with the other changes from var to ReadOnlySpan<T>.
Same below, but for some reason cannot add a suggestion, as it is not part of the diff.

_startCurrent = _startNext;

int separatorIndex = _splitOnSingleToken ? slice.IndexOf(_separator) : slice.IndexOf(_separators);
int separatorIndex;
Copy link
Contributor

@bbartels bbartels Jun 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats the motivation for this change? To improve runtime perf of Split(null) to make it not O(n^2)?

Oh, nevermind, i did not realize string.Split("") just returns the original string.

@GrabYourPitchforks
Copy link
Member

To be honest we should also consider having these methods return a type other than the generic SpanSplitEnumerator<char>. There are may different pivots that string.Split currently has, and we'll probably want MemoryExtensions.Split to have many of those same pivots:

  • Splitting on whitespace
  • Splitting on single char
  • Splitting on any of a set of chars
  • Splitting on a multi-char substring
  • Splitting on any of a set of multi-char substrings
  • All of the above, but with a StringSplitOptions

All of these are likely to have different implementations. But if we have a single enumerator type responsible for handling every one of these cases, then we need to complicate the enumerator type itself, including increasing the size of the struct to handle all possible cases and including a gigantic switch statement as part of the MoveNext routine. This will negatively impact performance and will also make SpanSplitEnumerator<byte> unnecessarily bloated even though it might not require feature parity.

@bbartels
Copy link
Contributor

bbartels commented Jun 11, 2020

I fully agree, many of these intricacies solely pertain to a char version of SpanSplitEnumerator, so having a separate one would probably make things easier going forward. The initial #295 PR was based on a multi Enumerator implementation (though for splitting on sequence/single element, not for string/non-string).

I feel like it's a bit dangerous that is currently merged into master given there are still many questions to be answered and potential amendments to the implementation. I'm not sure what guarantees dotnet provides for breaking changes between previews.

@GrabYourPitchforks
Copy link
Member

I'm not sure what guarantees dotnet provides for breaking changes between previews.

There aren't many guarantees made between previews, but I agree there are too many unanswered questions regarding the current implementation. I'll ping some folks offline to get their thoughts.

@bbartels
Copy link
Contributor

I feel bad for @davidfowl though, he seemed quite excited about this finally being merged 😅

@bbartels
Copy link
Contributor

@GrabYourPitchforks I finished up a fix for #37746, should I still submit a PR or wait for a decision to be made regarding fundamental changes the the Enumerator?

@GrabYourPitchforks
Copy link
Member

Hi @bbartels, that's awesome that you've made such quick progress on it!

I spoke with some others on the team and we think it's best that we run this through review again, backing out the original PR in the meantime. This should not be seen as a negative reflection on the code. Sometimes the unfortunate reality is that we don't notice these things until they're checked in and we actually try to use them. :(

And now that we have your code to work with, we'll be more informed during the review process. I suspect we'll be able to use the majority of this as a solid foundation for creating the final API surface and behavior. Please remain involved with this API if you're still interested!

@bbartels
Copy link
Contributor

I completely understand, I'm purely motivated by making .NET better and learning in the process! I am honestly happy that it's not being rushed and will be sent through review again. I feel like it's much more important to have a well thought out type in the runtime, than one where considerations have not been properly discussed! Definitely looking forward to staying involved; I'm off of Uni at the moment, so have more time than I know what to do with! Looking forward to the review!

@layomia
Copy link
Contributor Author

layomia commented Jun 11, 2020

Closing based on above discussions.

@layomia layomia closed this Jun 11, 2020
}

[Theory]
[InlineData("a,b,c", null, new[] { "a,b,c" })]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty string (value) ?

@davidfowl
Copy link
Member

😢

@ghost ghost locked as resolved and limited conversation to collaborators Dec 8, 2020
@layomia layomia deleted the ros_split branch May 18, 2021 07:03
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants